Protein identification using customized protein sequence databases derived from RNA-seq data

Xiaojing Wang, Robbert J.C. Slebos, Dong Wang, Patrick J. Halvey, David L. Tabb, Daniel C. Liebler, Bing Zhang

Producción científica: Articlerevisión exhaustiva

148 Citas (Scopus)

Resumen

The standard shotgun proteomics data analysis strategy relies on searching MS/MS spectra against a context-independent protein sequence database derived from the complete genome sequence of an organism. Because transcriptome sequence analysis (RNA-Seq) promises an unbiased and comprehensive picture of the transcriptome, we reason that a sample-specific protein database derived from RNA-Seq data can better approximate the real protein pool in the sample and thus improve protein identification. In this study, we have developed a two-step strategy for building sample-specific protein databases from RNA-Seq data. First, the database size is reduced by eliminating unexpressed or lowly expressed genes according to transcript quantification. Second, high-quality nonsynonymous coding single nucleotide variations (SNVs) are identified based on RNA-Seq data, and corresponding protein variants are added to the database. Using RNA-Seq and shotgun proteomics data from two colorectal cancer cell lines SW480 and RKO, we demonstrated that customized protein sequence databases could significantly increase the sensitivity of peptide identification, reduce ambiguity in protein assembly, and enable the detection of known and novel peptide variants. Thus, sample-specific databases from RNA-Seq data can enable more sensitive and comprehensive protein discovery in shotgun proteomics studies.

Idioma originalEnglish (US)
Páginas (desde-hasta)1009-1017
Número de páginas9
PublicaciónJournal of Proteome Research
Volumen11
N.º2
DOI
EstadoPublished - feb 3 2012
Publicado de forma externa

ASJC Scopus subject areas

  • General Chemistry
  • Biochemistry

Huella

Profundice en los temas de investigación de 'Protein identification using customized protein sequence databases derived from RNA-seq data'. En conjunto forman una huella única.

Citar esto