Multi-class HingeBoost*, method and application to the classification of cancer types using gene expression data

Research output: Contribution to journalArticlepeer-review

13 Scopus citations

Abstract

Background: Multi-class molecular cancer classification has great potential clinical implications. Such applications require statistical methods to accurately classify cancer types with a small subset of genes from thousands of genes in the data. Objectives: This paper presents a new functional gradient descent boosting algorithm that directly extends the HingeBoost algorithm from the binary case to the multi-class case without reducing the original problem to multiple binary problems. Methods: Minimizing a multi-class hinge loss with boosting technique, the proposed Hinge-Boost has good theoretical properties by implementing the Bayes decision rule and providing a unifying framework with either equalor unequal misclassification costs. Further - more, we propose Twin HingeBoost which has better feature selection behavior than Hinge-Boost by reducing the number of ineffective covariates. Simulated data, benchmark data and two cancer gene expression data sets are utilized to evaluate the performance of the proposed approach. Results: Simulations and the benchmark data showed that the multi-class HingeBoost generated accurate predictions when compared with the alternative methods, especially with highdimensional covariates. The multi-class Hinge-Boost also produced more accurate prediction or comparable prediction in two cancer classification problems using gene expression data. Conclusions: This work has shown that the HingeBoost provides a powerful tool for multi-classification problems. In many applications, the classification accuracy and feature selection behavior can be further improved when using Twin HingeBoost.

Original languageEnglish (US)
Pages (from-to)162-167
Number of pages6
JournalMethods of Information in Medicine
Volume51
Issue number2
DOIs
StatePublished - 2012
Externally publishedYes

Keywords

  • Boosting
  • Classification
  • Regression trees
  • Smoothing splines
  • Variable selection

ASJC Scopus subject areas

  • Health Informatics
  • Advanced and Specialized Nursing
  • Health Information Management

Fingerprint

Dive into the research topics of 'Multi-class HingeBoost*, method and application to the classification of cancer types using gene expression data'. Together they form a unique fingerprint.

Cite this