TY - JOUR
T1 - PGA
T2 - An R/Bioconductor package for identification of novel peptides using a customized database derived from RNA-Seq
AU - Wen, Bo
AU - Xu, Shaohang
AU - Zhou, Ruo
AU - Zhang, Bing
AU - Wang, Xiaojing
AU - Liu, Xin
AU - Xu, Xun
AU - Liu, Siqi
N1 - Publisher Copyright:
© 2016 The Author(s).
PY - 2016/6/17
Y1 - 2016/6/17
N2 - Background: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary. Results: A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/ , and the example reports are available at http://wenbostar.github.io/PGA/. Conclusions: The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data.
AB - Background: Peptide identification based upon mass spectrometry (MS) is generally achieved by comparison of the experimental mass spectra with the theoretically digested peptides derived from a reference protein database. Obviously, this strategy could not identify peptide and protein sequences that are absent from a reference database. A customized protein database on the basis of RNA-Seq data is thus proposed to assist with and improve the identification of novel peptides. Correspondingly, development of a comprehensive pipeline, which provides an end-to-end solution for novel peptide detection with the customized protein database, is necessary. Results: A pipeline with an R package, assigned as a PGA utility, was developed that enables automated treatment to the tandem mass spectrometry (MS/MS) data acquired from different MS platforms and construction of customized protein databases based on RNA-Seq data with or without a reference genome guide. Hence, PGA can identify novel peptides and generate an HTML-based report with a visualized interface. On the basis of a published dataset, PGA was employed to identify peptides, resulting in 636 novel peptides, including 510 single amino acid polymorphism (SAP) peptides, 2 INDEL peptides, 49 splice junction peptides, and 75 novel transcript-derived peptides. The software is freely available from http://bioconductor.org/packages/PGA/ , and the example reports are available at http://wenbostar.github.io/PGA/. Conclusions: The pipeline of PGA, aimed at being platform-independent and easy-to-use, was successfully developed and shown to be capable of identifying novel peptides by searching the customized protein database derived from RNA-Seq data.
KW - MS/MS
KW - Peptide identification
KW - Proteogenomics
KW - Proteomics
KW - RNA-Seq
UR - http://www.scopus.com/inward/record.url?scp=84975122069&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84975122069&partnerID=8YFLogxK
U2 - 10.1186/s12859-016-1133-3
DO - 10.1186/s12859-016-1133-3
M3 - Article
C2 - 27316337
AN - SCOPUS:84975122069
SN - 1471-2105
VL - 17
JO - BMC bioinformatics
JF - BMC bioinformatics
IS - 1
M1 - 244
ER -