Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data

Alan Hubbard, Ivan Diaz Munoz, Anna Decker, John B. Holcomb, Martin A. Schreiber, Eileen M. Bulger, Karen J. Brasel, Erin E. Fox, Deborah J. Del Junco, Charles E. Wade, Mohammad H. Rahbar, Bryan A. Cotton, Herb A. Phelan, John G. Myers, Louis H. Alarcon, Peter Muskat, Mitchell J. Cohen

Research output: Contribution to journalArticlepeer-review

16 Scopus citations

Abstract

BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

Original languageEnglish (US)
Pages (from-to)S53-S60
JournalJournal of Trauma and Acute Care Surgery
Volume75
Issue number1 SUPPL1
DOIs
StatePublished - Jul 26 2013

Keywords

  • Causal inference
  • Injury
  • PROMMTT
  • Statistical prediction
  • Trauma

ASJC Scopus subject areas

  • Surgery
  • Critical Care and Intensive Care Medicine

Fingerprint Dive into the research topics of 'Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data'. Together they form a unique fingerprint.

Cite this