TY - JOUR
T1 - Two-Step Imputation and AdaBoost-Based Classification for Early Prediction of Sepsis on Imbalanced Clinical Data
AU - Baniasadi, Atefeh
AU - Rezaeirad, Sepideh
AU - Zare, Habil
AU - Ghassemi, Mohammad M.
N1 - Funding Information:
Atefeh Baniasadi and Sepideh Rezaeirad contributed equally as first authors, and Habil Zare and Mohammad M. Ghassemi contributed equally as the last authors. Dr. Ghassemi received funding from Ghamut Corporation (founder), Michigan State University, Traive Finance, Hivemind Networks. The remaining authors have disclosed that they do not have any potential conflicts of interest. For information regarding this article, E-mail: a.baniasadi@sem-nan.ac.ir
Publisher Copyright:
Copyright © 2020 by the Society of Critical Care Medicine and Wolters Kluwer Health, Inc. All Rights Reserved.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - OBJECTIVES: Sepsis is a life-threatening response to infection that causes tissue damage, organ failure, and death. Effective early prediction of sepsis would improve patients' diagnosis and reduce the cost associated with late-stage sepsis infection by applying appropriate early intervention. However, effective early prediction is challenging because sepsis biomarkers are neither obvious nor definitive, and sepsis datasets are heavily imbalanced against positive diagnosis of sepsis while containing significant missing values. Early prediction of sepsis in ICUs using clinical data is the objective of the PhysioNet/Computing in Cardiology Challenge 2019. DESIGN: In this article, we proposed a machine learning algorithm to aid in the early detection of sepsis. SETTING: We applied linear interpolation and implemented a sample weighted AdaBoost model to predict sepsis 6 hours before clinical diagnosis. PATIENTS: Medical data contains more than 40,000 patients gathered from three geographically distinct U.S. hospital systems that consisted of a combination of hourly vital sign, lab values, and static patient descriptions. INTERVENTIONS: The challenge metric, however, did not directly reward models for their generalizability across institutions. MEASUREMENTS AND MAIN RESULTS: The article is evaluated using a new metric called Utility Score that is defined as Official scoring criteria. Our approach was among the top 10% of entries to the Challenge on a hidden test set. CONCLUSIONS: Herein, we demonstrate that our proposed approach was the most effective of the Challenge entrants when such generalizability is explicitly accounted for in model evaluation.
AB - OBJECTIVES: Sepsis is a life-threatening response to infection that causes tissue damage, organ failure, and death. Effective early prediction of sepsis would improve patients' diagnosis and reduce the cost associated with late-stage sepsis infection by applying appropriate early intervention. However, effective early prediction is challenging because sepsis biomarkers are neither obvious nor definitive, and sepsis datasets are heavily imbalanced against positive diagnosis of sepsis while containing significant missing values. Early prediction of sepsis in ICUs using clinical data is the objective of the PhysioNet/Computing in Cardiology Challenge 2019. DESIGN: In this article, we proposed a machine learning algorithm to aid in the early detection of sepsis. SETTING: We applied linear interpolation and implemented a sample weighted AdaBoost model to predict sepsis 6 hours before clinical diagnosis. PATIENTS: Medical data contains more than 40,000 patients gathered from three geographically distinct U.S. hospital systems that consisted of a combination of hourly vital sign, lab values, and static patient descriptions. INTERVENTIONS: The challenge metric, however, did not directly reward models for their generalizability across institutions. MEASUREMENTS AND MAIN RESULTS: The article is evaluated using a new metric called Utility Score that is defined as Official scoring criteria. Our approach was among the top 10% of entries to the Challenge on a hidden test set. CONCLUSIONS: Herein, we demonstrate that our proposed approach was the most effective of the Challenge entrants when such generalizability is explicitly accounted for in model evaluation.
KW - AdaBoost
KW - PhysioNet Challenge 2019
KW - clinical data
KW - early prediction of sepsis
KW - missing value
KW - unbalanced data
UR - http://www.scopus.com/inward/record.url?scp=85098742449&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098742449&partnerID=8YFLogxK
U2 - 10.1097/CCM.0000000000004705
DO - 10.1097/CCM.0000000000004705
M3 - Article
C2 - 33156121
AN - SCOPUS:85098742449
SN - 0090-3493
SP - E91-E97
JO - Critical Care Medicine
JF - Critical Care Medicine
ER -