TY - GEN
T1 - Identification of biomarkers in breast cancer metastasis by integrating protein-protein interaction network and gene expression data
AU - Jahid, Md Jamiul
AU - Ruan, Jianhua
PY - 2011/1/1
Y1 - 2011/1/1
N2 - Identification of biomarkers for breast cancer metastasis is a well studied problem. Recently, several large-scale studies used gene expression data to identify markers related to metastatic process. However, it was shown that these gene expression based markers often have low reproducibility across different data sets, for a number of reasons. These include small sample sizes compared to the number of genes, gene expression variations between individuals that do not contribute to the metastasis process, and the limitation for microarray technology being unable to detect changes beyond transcriptional level. Here a graph-theoretical approach based on the topology of protein-protein interaction (PPI) networks is proposed for biomarker discovery. The idea is to identify a set of genes that give connectivity to differentially expressed (DE) genes in a PPI network, based on the key observation that biomarkers may provide functional linkage to DE genes in PPI networks. Our approach is applied to two breast cancer microarray datasets for biomarker discovery. Those biomarkers have a significant number of known cancer susceptibility genes among them and are significantly enriched in biological processes and pathways that are involved in carcinogenic process. Furthermore, markers selected by our method have a higher stability across the two datasets than in the previous studies. Therefore, the approach described in this study is a new way to identify novel biomarkers for cancer metastasis and can potentially improve the understanding of carcinogenesis dynamics.
AB - Identification of biomarkers for breast cancer metastasis is a well studied problem. Recently, several large-scale studies used gene expression data to identify markers related to metastatic process. However, it was shown that these gene expression based markers often have low reproducibility across different data sets, for a number of reasons. These include small sample sizes compared to the number of genes, gene expression variations between individuals that do not contribute to the metastasis process, and the limitation for microarray technology being unable to detect changes beyond transcriptional level. Here a graph-theoretical approach based on the topology of protein-protein interaction (PPI) networks is proposed for biomarker discovery. The idea is to identify a set of genes that give connectivity to differentially expressed (DE) genes in a PPI network, based on the key observation that biomarkers may provide functional linkage to DE genes in PPI networks. Our approach is applied to two breast cancer microarray datasets for biomarker discovery. Those biomarkers have a significant number of known cancer susceptibility genes among them and are significantly enriched in biological processes and pathways that are involved in carcinogenic process. Furthermore, markers selected by our method have a higher stability across the two datasets than in the previous studies. Therefore, the approach described in this study is a new way to identify novel biomarkers for cancer metastasis and can potentially improve the understanding of carcinogenesis dynamics.
KW - Biomarker
KW - Differentially expressed genes
KW - Gene expression data
KW - Metastasis
KW - Protein protein interaction network
UR - http://www.scopus.com/inward/record.url?scp=84863661862&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84863661862&partnerID=8YFLogxK
U2 - 10.1109/gensips.2011.6169443
DO - 10.1109/gensips.2011.6169443
M3 - Conference contribution
AN - SCOPUS:84863661862
SN - 9781467304900
T3 - Proceedings - IEEE International Workshop on Genomic Signal Processing and Statistics
SP - 60
EP - 63
BT - Proceedings 2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11
PB - IEEE Computer Society
T2 - 2011 IEEE International Workshop on Genomic Signal Processing and Statistics, GENSIPS'11
Y2 - 4 December 2011 through 6 December 2011
ER -