TY - GEN
T1 - A random walk based approach for improving protein-protein interaction network and protein complex prediction
AU - Lei, Chengwei
AU - Ruan, Jianhua
PY - 2012/12/1
Y1 - 2012/12/1
N2 - Recent advances in high-throughput technology have dramatically increased the quantity of available protein-protein interaction (PPI) data and stimulated the development of many methods for predicting protein complexes, which are important in understanding the functional organization of protein-protein interaction networks in different biological processes. However, automated protein complex prediction from PPI data alone is significantly hindered by the high level of noise, sparseness, and highly skewed degree distribution of PPI networks. Here we present a novel network topology-based algorithm to remove spurious interactions and recover missing ones by computational predictions, and to increase the accuracy of protein complex prediction by reducing the impact of hub nodes. The key idea of our algorithm is that two proteins sharing some high-order topological similarities, which are measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. Applying our algorithm to a yeast protein-protein interaction network, we found that the interactions in the reconstructed PPI network have more significant biological relevance than the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species, and known protein complexes. Comparison with several existing methods show that the network reconstructed by our method has the highest quality. Finally, using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes.
AB - Recent advances in high-throughput technology have dramatically increased the quantity of available protein-protein interaction (PPI) data and stimulated the development of many methods for predicting protein complexes, which are important in understanding the functional organization of protein-protein interaction networks in different biological processes. However, automated protein complex prediction from PPI data alone is significantly hindered by the high level of noise, sparseness, and highly skewed degree distribution of PPI networks. Here we present a novel network topology-based algorithm to remove spurious interactions and recover missing ones by computational predictions, and to increase the accuracy of protein complex prediction by reducing the impact of hub nodes. The key idea of our algorithm is that two proteins sharing some high-order topological similarities, which are measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. Applying our algorithm to a yeast protein-protein interaction network, we found that the interactions in the reconstructed PPI network have more significant biological relevance than the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species, and known protein complexes. Comparison with several existing methods show that the network reconstructed by our method has the highest quality. Finally, using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes.
KW - Clustering
KW - Link prediction
KW - Protein complex
KW - Protein-protein interaction network
UR - http://www.scopus.com/inward/record.url?scp=84872540763&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84872540763&partnerID=8YFLogxK
U2 - 10.1109/BIBM.2012.6392693
DO - 10.1109/BIBM.2012.6392693
M3 - Conference contribution
AN - SCOPUS:84872540763
SN - 9781467325585
T3 - Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2012
SP - 337
EP - 342
BT - Proceedings - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2012
T2 - 2012 IEEE International Conference on Bioinformatics and Biomedicine, BIBM2012
Y2 - 4 October 2012 through 7 October 2012
ER -