Two-Step Imputation and AdaBoost-Based Classification for Early Prediction of Sepsis on Imbalanced Clinical Data

Atefeh Baniasadi, Sepideh Rezaeirad, Habil Zare, Mohammad M. Ghassemi

Research output: Contribution to journalArticlepeer-review

9 Scopus citations


OBJECTIVES: Sepsis is a life-threatening response to infection that causes tissue damage, organ failure, and death. Effective early prediction of sepsis would improve patients' diagnosis and reduce the cost associated with late-stage sepsis infection by applying appropriate early intervention. However, effective early prediction is challenging because sepsis biomarkers are neither obvious nor definitive, and sepsis datasets are heavily imbalanced against positive diagnosis of sepsis while containing significant missing values. Early prediction of sepsis in ICUs using clinical data is the objective of the PhysioNet/Computing in Cardiology Challenge 2019. DESIGN: In this article, we proposed a machine learning algorithm to aid in the early detection of sepsis. SETTING: We applied linear interpolation and implemented a sample weighted AdaBoost model to predict sepsis 6 hours before clinical diagnosis. PATIENTS: Medical data contains more than 40,000 patients gathered from three geographically distinct U.S. hospital systems that consisted of a combination of hourly vital sign, lab values, and static patient descriptions. INTERVENTIONS: The challenge metric, however, did not directly reward models for their generalizability across institutions. MEASUREMENTS AND MAIN RESULTS: The article is evaluated using a new metric called Utility Score that is defined as Official scoring criteria. Our approach was among the top 10% of entries to the Challenge on a hidden test set. CONCLUSIONS: Herein, we demonstrate that our proposed approach was the most effective of the Challenge entrants when such generalizability is explicitly accounted for in model evaluation.

Original languageEnglish (US)
Pages (from-to)E91-E97
JournalCritical care medicine
StatePublished - Jan 1 2021


  • AdaBoost
  • PhysioNet Challenge 2019
  • clinical data
  • early prediction of sepsis
  • missing value
  • unbalanced data

ASJC Scopus subject areas

  • Critical Care and Intensive Care Medicine


Dive into the research topics of 'Two-Step Imputation and AdaBoost-Based Classification for Early Prediction of Sepsis on Imbalanced Clinical Data'. Together they form a unique fingerprint.

Cite this