TY - JOUR
T1 - Comprehensive and sensitive proteogenomics data analysis strategy based on complementary multi-stage database search
AU - Madar, Inamul Hasan
AU - Lee, Wonyeop
AU - Wang, Xiaojing
AU - Ko, Seung Ik
AU - Kim, Hokeun
AU - Mun, Dong Gi
AU - Zhang, Bing
AU - Paek, Eunok
AU - Lee, Sang Won
N1 - Publisher Copyright:
© 2017 Elsevier B.V.
PY - 2018/4
Y1 - 2018/4
N2 - Proteogenomics provide opportunities for proteomic validation of gene structures, genomic alterations and functional relevance of novel findings obtained from genomic data analysis. However, for effective proteogenomic data integration, an extensive proteome profiling, approaching the gene coverage of genomics data, is critical. Here we developed a multi-stage database search method for comprehensive proteomics data analysis to complement whole transcriptome sequencing data. The method utilizes two complementary database search engines, MS-GF+ and MODa/MODi, in tandem. The MS/MS data were first subjected to MS-GF+ database search (1st stage search) and the unidentified MS/MS data from the 1st stage search were subsequently analyzed with the combined use of MODa and MODi (2nd stage search), tools for blind and unrestrictive modification search, respectively. When combined with mPE-MMR, a tool for accurate and extensive precursor masses assignments to MS/MS data, the multi-stage method exhibited a significant increase in identified peptides, modified peptides, mutated peptides, identified proteins and coding genes, compared to a conventional single-stage method. With the increased coverage of proteome profile, the genomics and proteomics data obtained from the same gastric tumor tissue were effectively integrated as evidenced by proBAMsuite analysis results, which showed abundant examples of peptides uniquely mapped to genomic locations as well as increased coverages of exon-exon junctions and coding regions with the multi-stage search method.
AB - Proteogenomics provide opportunities for proteomic validation of gene structures, genomic alterations and functional relevance of novel findings obtained from genomic data analysis. However, for effective proteogenomic data integration, an extensive proteome profiling, approaching the gene coverage of genomics data, is critical. Here we developed a multi-stage database search method for comprehensive proteomics data analysis to complement whole transcriptome sequencing data. The method utilizes two complementary database search engines, MS-GF+ and MODa/MODi, in tandem. The MS/MS data were first subjected to MS-GF+ database search (1st stage search) and the unidentified MS/MS data from the 1st stage search were subsequently analyzed with the combined use of MODa and MODi (2nd stage search), tools for blind and unrestrictive modification search, respectively. When combined with mPE-MMR, a tool for accurate and extensive precursor masses assignments to MS/MS data, the multi-stage method exhibited a significant increase in identified peptides, modified peptides, mutated peptides, identified proteins and coding genes, compared to a conventional single-stage method. With the increased coverage of proteome profile, the genomics and proteomics data obtained from the same gastric tumor tissue were effectively integrated as evidenced by proBAMsuite analysis results, which showed abundant examples of peptides uniquely mapped to genomic locations as well as increased coverages of exon-exon junctions and coding regions with the multi-stage search method.
KW - Multi-stage database search
KW - Mutations
KW - PTMs
KW - Proteogenomics
KW - Unidentified spectra
KW - mPE-MMR
UR - http://www.scopus.com/inward/record.url?scp=85031825383&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85031825383&partnerID=8YFLogxK
U2 - 10.1016/j.ijms.2017.08.015
DO - 10.1016/j.ijms.2017.08.015
M3 - Article
AN - SCOPUS:85031825383
SN - 1387-3806
VL - 427
SP - 11
EP - 19
JO - International Journal of Mass Spectrometry
JF - International Journal of Mass Spectrometry
ER -