TY - JOUR
T1 - Reducing confounding and suppression effects in TCGA data
T2 - an integrated analysis of chemotherapy response in ovarian cancer.
AU - Hsu, Fang Han
AU - Serpedin, Erchin
AU - Hsiao, Tzu Hung
AU - Bishop, Alexander J.R.
AU - Dougherty, Edward R.
AU - Chen, Yidong
N1 - Funding Information:
Based on “Identifying genes associated with chemotherapy response in ovarian carcinomas based on DNA copy number and expression profiles”, by Fang-Han Hsu, Erchin Serpedin, Tzu-Hung Hsiao, Alexander JR Bishop, Edward R Dougherty and Yidong Chen which appeared in Genomic Signal Processing and Statistics (GENSIPS), 2011 IEEE International Workshop on. © 2011 IEEE [25]. The authors would like to thank the members in the Genomic Signal Processing Laboratory, Texas A&M University, College Station, and the members in the Bioinformatics and Biostatistics Core Laboratory, National Taiwan University, Taipei, Taiwan, for helpful discussions. This work was supported by the National Science Foundation under Grant 0915444. AJR Bishop is supported by the NIEHS (K22-ES12264) and a Voelcker Fund Young Investigator Award from the Max and Minnie Tomerlin Voelcker Fund; Y. Chen is supported by NIH/NCI cancer center grant (P30 CA054174-17), NIH/ NCRR CTSA grant (1UL1RR025767), and partially supported by a Voelcker Fund Young Investigator Award from the Max and Minnie Tomerlin Voelcker Fund; F-H Hsu was partially supported by the Greehey Children Cancer Research Institute (GC-CRI) Summer Internship Program. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. This article has been published as part of BMC Genomics Volume 13 Supplement 6, 2012: Selected articles from the IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS) 2011. The full contents of the supplement are available online at http://www. biomedcentral.com/bmcgenomics/supplements/13/S6.
PY - 2012
Y1 - 2012
N2 - Despite initial response in adjuvant chemotherapy, ovarian cancer patients treated with the combination of paclitaxel and carboplatin frequently suffer from recurrence after few cycles of treatment, and the underlying mechanisms causing the chemoresistance remain unclear. Recently, The Cancer Genome Atlas (TCGA) research network concluded an ovarian cancer study and released the dataset to the public. The TCGA dataset possesses large sample size, comprehensive molecular profiles, and clinical outcome information; however, because of the unknown molecular subtypes in ovarian cancer and the great diversity of adjuvant treatments TCGA patients went through, studying chemotherapeutic response using the TCGA data is difficult. Additionally, factors such as sample batches, patient ages, and tumor stages further confound or suppress the identification of relevant genes, and thus the biological functions and disease mechanisms. To address these issues, herein we propose an analysis procedure designed to reduce suppression effect by focusing on a specific chemotherapeutic treatment, and to remove confounding effects such as batch effect, patient's age, and tumor stages. The proposed procedure starts with a batch effect adjustment, followed by a rigorous sample selection process. Then, the gene expression, copy number, and methylation profiles from the TCGA ovarian cancer dataset are analyzed using a semi-supervised clustering method combined with a novel scoring function. As a result, two molecular classifications, one with poor copy number profiles and one with poor methylation profiles, enriched with unfavorable scores are identified. Compared with the samples enriched with favorable scores, these two classifications exhibit poor progression-free survival (PFS) and might be associated with poor chemotherapy response specifically to the combination of paclitaxel and carboplatin. Significant genes and biological processes are detected subsequently using classical statistical approaches and enrichment analysis. The proposed procedure for the reduction of confounding and suppression effects and the semi-supervised clustering method are essential steps to identify genes associated with the chemotherapeutic response.
AB - Despite initial response in adjuvant chemotherapy, ovarian cancer patients treated with the combination of paclitaxel and carboplatin frequently suffer from recurrence after few cycles of treatment, and the underlying mechanisms causing the chemoresistance remain unclear. Recently, The Cancer Genome Atlas (TCGA) research network concluded an ovarian cancer study and released the dataset to the public. The TCGA dataset possesses large sample size, comprehensive molecular profiles, and clinical outcome information; however, because of the unknown molecular subtypes in ovarian cancer and the great diversity of adjuvant treatments TCGA patients went through, studying chemotherapeutic response using the TCGA data is difficult. Additionally, factors such as sample batches, patient ages, and tumor stages further confound or suppress the identification of relevant genes, and thus the biological functions and disease mechanisms. To address these issues, herein we propose an analysis procedure designed to reduce suppression effect by focusing on a specific chemotherapeutic treatment, and to remove confounding effects such as batch effect, patient's age, and tumor stages. The proposed procedure starts with a batch effect adjustment, followed by a rigorous sample selection process. Then, the gene expression, copy number, and methylation profiles from the TCGA ovarian cancer dataset are analyzed using a semi-supervised clustering method combined with a novel scoring function. As a result, two molecular classifications, one with poor copy number profiles and one with poor methylation profiles, enriched with unfavorable scores are identified. Compared with the samples enriched with favorable scores, these two classifications exhibit poor progression-free survival (PFS) and might be associated with poor chemotherapy response specifically to the combination of paclitaxel and carboplatin. Significant genes and biological processes are detected subsequently using classical statistical approaches and enrichment analysis. The proposed procedure for the reduction of confounding and suppression effects and the semi-supervised clustering method are essential steps to identify genes associated with the chemotherapeutic response.
UR - http://www.scopus.com/inward/record.url?scp=84876082047&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84876082047&partnerID=8YFLogxK
U2 - 10.1186/1471-2164-13-s6-s13
DO - 10.1186/1471-2164-13-s6-s13
M3 - Article
C2 - 23134756
AN - SCOPUS:84876082047
SN - 1471-2164
VL - 13 Suppl 6
JO - BMC genomics
JF - BMC genomics
ER -