TY - JOUR
T1 - A robust unified approach to analyzing methylation and gene expression data
AU - Khalili, Abbas
AU - Huang, Tim
AU - Lin, Shili
N1 - Funding Information:
This work was supported in part by the National Cancer Institute grant U54CA113001, and the National Science Foundation grants DMS-0112050. We thank Dr. Dustin Potter for his help with the implementation of the software.
PY - 2009/3/15
Y1 - 2009/3/15
N2 - Microarray technology has made it possible to investigate expression levels, and more recently methylation signatures, of thousands of genes simultaneously, in a biological sample. Since more and more data from different biological systems or technological platforms are being generated at an incredible rate, there is an increasing need to develop statistical methods that are applicable to multiple data types and platforms. Motivated by such a need, a flexible finite mixture model that is applicable to methylation, gene expression, and potentially data from other biological systems, is proposed. Two major thrusts of this approach are to allow for a variable number of components in the mixture to capture non-biological variation and small biases, and to use a robust procedure for parameter estimation and probe classification. The method was applied to the analysis of methylation signatures of three breast cancer cell lines. It was also tested on three sets of expression microarray data to study its power and type I error rates. Comparison with a number of existing methods in the literature yielded very encouraging results; lower type I error rates and comparable/better power were achieved based on the limited study. Furthermore, the method also leads to more biologically interpretable results for the three breast cancer cell lines.
AB - Microarray technology has made it possible to investigate expression levels, and more recently methylation signatures, of thousands of genes simultaneously, in a biological sample. Since more and more data from different biological systems or technological platforms are being generated at an incredible rate, there is an increasing need to develop statistical methods that are applicable to multiple data types and platforms. Motivated by such a need, a flexible finite mixture model that is applicable to methylation, gene expression, and potentially data from other biological systems, is proposed. Two major thrusts of this approach are to allow for a variable number of components in the mixture to capture non-biological variation and small biases, and to use a robust procedure for parameter estimation and probe classification. The method was applied to the analysis of methylation signatures of three breast cancer cell lines. It was also tested on three sets of expression microarray data to study its power and type I error rates. Comparison with a number of existing methods in the literature yielded very encouraging results; lower type I error rates and comparable/better power were achieved based on the limited study. Furthermore, the method also leads to more biologically interpretable results for the three breast cancer cell lines.
UR - http://www.scopus.com/inward/record.url?scp=60349095410&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=60349095410&partnerID=8YFLogxK
U2 - 10.1016/j.csda.2008.07.010
DO - 10.1016/j.csda.2008.07.010
M3 - Article
AN - SCOPUS:60349095410
SN - 0167-9473
VL - 53
SP - 1701
EP - 1710
JO - Computational Statistics and Data Analysis
JF - Computational Statistics and Data Analysis
IS - 5
ER -