Robust boosting with truncated loss functions

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Boosting is a powerful machine learning tool with attractive theoretical properties. In recent years, boosting algorithms have been extended to many statistical estimation problems. For data contaminated with outliers, however, development of boosting algorithms is very limited. In this paper, innovative robust boosting algorithms utilizing the majorization-minimization (MM) principle are developed for binary and multi-category classification problems. Based on truncated loss functions, the robust boosting algorithms share a unified framework for linear and nonlinear effects models. The proposed methods can reduce the heavy influence from a small number of outliers which could otherwise distort the results. In addition, adaptive boosting for the truncated loss functions are developed to construct more sparse predictive models. We present convergence guarantees for smooth surrogate loss functions with both iteration-varying and constant step-sizes. We conducted empirical studies using data from simulations, a pediatric database developed for the US Healthcare Cost and Utilization Project, and breast cancer gene expression data. Compared with non-robust boosting, robust boosting improves classification accuracy and variable selection.

Original languageEnglish (US)
Pages (from-to)599-650
Number of pages52
JournalElectronic Journal of Statistics
Volume12
Issue number1
DOIs
StatePublished - 2018
Externally publishedYes

Keywords

  • Boosting
  • Difference of convex
  • Machine learning
  • MM algorithm
  • Robust method

ASJC Scopus subject areas

  • Statistics and Probability
  • Statistics, Probability and Uncertainty

Fingerprint

Dive into the research topics of 'Robust boosting with truncated loss functions'. Together they form a unique fingerprint.

Cite this