Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany

Zhu Wang, Shuangge Ma, Ching Yun Wang

Research output: Contribution to journalArticlepeer-review

24 Scopus citations


In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero-inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation-maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open-source R package mpath.

Original languageEnglish (US)
Pages (from-to)867-884
Number of pages18
JournalBiometrical Journal
Issue number5
StatePublished - Sep 1 2015
Externally publishedYes


  • MCP
  • SCAD
  • Variable selection
  • ZINB

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty


Dive into the research topics of 'Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany'. Together they form a unique fingerprint.

Cite this