Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data

Alan Hubbard, Ivan Diaz Munoz, Anna Decker, John B. Holcomb, Martin A. Schreiber, Eileen M. Bulger, Karen J. Brasel, Erin E. Fox, Deborah J. Del Junco, Charles E. Wade, Mohammad H. Rahbar, Bryan A. Cotton, Herb A. Phelan, John G Myers, Louis H. Alarcon, Peter Muskat, Mitchell J. Cohen

Research output: Contribution to journalArticle

15 Citations (Scopus)

Abstract

BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

Original languageEnglish (US)
JournalJournal of Trauma and Acute Care Surgery
Volume75
Issue number1 SUPPL1
DOIs
StatePublished - 2013

Fingerprint

Mortality
Uncertainty
Logistic Models
Hemorrhage
Wounds and Injuries
Machine Learning
Datasets

Keywords

  • Causal inference
  • Injury
  • PROMMTT
  • Statistical prediction
  • Trauma

ASJC Scopus subject areas

  • Critical Care and Intensive Care Medicine
  • Surgery

Cite this

Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data. / Hubbard, Alan; Munoz, Ivan Diaz; Decker, Anna; Holcomb, John B.; Schreiber, Martin A.; Bulger, Eileen M.; Brasel, Karen J.; Fox, Erin E.; Del Junco, Deborah J.; Wade, Charles E.; Rahbar, Mohammad H.; Cotton, Bryan A.; Phelan, Herb A.; Myers, John G; Alarcon, Louis H.; Muskat, Peter; Cohen, Mitchell J.

In: Journal of Trauma and Acute Care Surgery, Vol. 75, No. 1 SUPPL1, 2013.

Research output: Contribution to journalArticle

Hubbard, A, Munoz, ID, Decker, A, Holcomb, JB, Schreiber, MA, Bulger, EM, Brasel, KJ, Fox, EE, Del Junco, DJ, Wade, CE, Rahbar, MH, Cotton, BA, Phelan, HA, Myers, JG, Alarcon, LH, Muskat, P & Cohen, MJ 2013, 'Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data', Journal of Trauma and Acute Care Surgery, vol. 75, no. 1 SUPPL1. https://doi.org/10.1097/TA.0b013e3182914553
Hubbard, Alan ; Munoz, Ivan Diaz ; Decker, Anna ; Holcomb, John B. ; Schreiber, Martin A. ; Bulger, Eileen M. ; Brasel, Karen J. ; Fox, Erin E. ; Del Junco, Deborah J. ; Wade, Charles E. ; Rahbar, Mohammad H. ; Cotton, Bryan A. ; Phelan, Herb A. ; Myers, John G ; Alarcon, Louis H. ; Muskat, Peter ; Cohen, Mitchell J. / Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data. In: Journal of Trauma and Acute Care Surgery. 2013 ; Vol. 75, No. 1 SUPPL1.
@article{a12076a547d9416cb32a49640f89f3b1,
title = "Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data",
abstract = "BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.",
keywords = "Causal inference, Injury, PROMMTT, Statistical prediction, Trauma",
author = "Alan Hubbard and Munoz, {Ivan Diaz} and Anna Decker and Holcomb, {John B.} and Schreiber, {Martin A.} and Bulger, {Eileen M.} and Brasel, {Karen J.} and Fox, {Erin E.} and {Del Junco}, {Deborah J.} and Wade, {Charles E.} and Rahbar, {Mohammad H.} and Cotton, {Bryan A.} and Phelan, {Herb A.} and Myers, {John G} and Alarcon, {Louis H.} and Peter Muskat and Cohen, {Mitchell J.}",
year = "2013",
doi = "10.1097/TA.0b013e3182914553",
language = "English (US)",
volume = "75",
journal = "Journal of Trauma and Acute Care Surgery",
issn = "2163-0755",
publisher = "Lippincott Williams and Wilkins",
number = "1 SUPPL1",

}

TY - JOUR

T1 - Time-dependent prediction and evaluation of variable importance using superlearning in high-dimensional clinical data

AU - Hubbard, Alan

AU - Munoz, Ivan Diaz

AU - Decker, Anna

AU - Holcomb, John B.

AU - Schreiber, Martin A.

AU - Bulger, Eileen M.

AU - Brasel, Karen J.

AU - Fox, Erin E.

AU - Del Junco, Deborah J.

AU - Wade, Charles E.

AU - Rahbar, Mohammad H.

AU - Cotton, Bryan A.

AU - Phelan, Herb A.

AU - Myers, John G

AU - Alarcon, Louis H.

AU - Muskat, Peter

AU - Cohen, Mitchell J.

PY - 2013

Y1 - 2013

N2 - BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

AB - BACKGROUND: Prediction of outcome after injury is fraught with uncertainty and statistically beset by misspecified models. Single-time point regression only gives prediction and inference at one time, of dubious value for continuous prediction of ongoing bleeding. New statistical machine learning techniques such as SuperLearner (SL) exist to make superior prediction at iterative time points while evaluating the changing relative importance of each measured variable on an outcome. This then can provide continuously changing prediction of outcome and evaluation of which clinical variables likely drive a particular outcome. METHODS: PROMMTT data were evaluated using both naive (standard stepwise logistic regression) and SL techniques to develop a timedependent prediction of future mortality within discrete time intervals. We avoided both underfitting and overfitting using cross validation to select an optimal combination of predictors among candidate predictors/machine learning algorithms. SL was also used to produce interval-specific robust measures of variable importance measures (VIM resulting in an ordered list of variables, by time point) that have the strongest impact on future mortality. RESULTS: Nine hundred eighty patients had complete clinical and outcome data and were included in the analysis. The prediction of ongoing transfusion with SL was superior to the naive approach for all time intervals (correlations of cross-validated predictions with the outcome were 0.819, 0.789, 0.792 for time intervals 30Y90, 90-180, 180-360, 9360 minutes). The estimated VIM of mortality also changed significantly at each time point. CONCLUSION: The SL technique for prediction of outcome from a complex dynamic multivariate data set is superior at each time interval to standard models. In addition, the SLVIM at each time point provides insight into the time-specific drivers of future outcome, patient trajectory, and targets for clinical intervention. Thus, this automated approach mimics clinical practice, changing form and content through time to optimize the accuracy of the prognosis based on the evolving trajectory of the patient.

KW - Causal inference

KW - Injury

KW - PROMMTT

KW - Statistical prediction

KW - Trauma

UR - http://www.scopus.com/inward/record.url?scp=84880413655&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84880413655&partnerID=8YFLogxK

U2 - 10.1097/TA.0b013e3182914553

DO - 10.1097/TA.0b013e3182914553

M3 - Article

C2 - 23778512

AN - SCOPUS:84880413655

VL - 75

JO - Journal of Trauma and Acute Care Surgery

JF - Journal of Trauma and Acute Care Surgery

SN - 2163-0755

IS - 1 SUPPL1

ER -