A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data

Xiaodong Cui, Jia Meng, Shaowu Zhang, Yidong Chen, Yufei Huang

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Motivation: N6-methyl-adenosine (m6A) is the most prevalent mRNA methylation but precise prediction of its mRNA location is important for understanding its function. A recent sequencing technology, known as Methylated RNA Immunoprecipitation Sequencing technology (MeRIP-seq), has been developed for transcriptome-wide profiling of m6A. We previously developed a peak calling algorithm called exomePeak. However, exomePeak over-simplifies data characteristics and ignores the reads' variances among replicates or reads dependency across a site region. To further improve the performance, new model is needed to address these important issues of MeRIP-seq data. Results: We propose a novel, graphical model-based peak calling method, MeTPeak, for transcriptome-wide detection of m6A sites from MeRIP-seq data. MeTPeak explicitly models read count of an m6A site and introduces a hierarchical layer of Beta variables to capture the variances and a Hidden Markov model to characterize the reads dependency across a site. In addition, we developed a constrained Newton's method and designed a log-barrier function to compute analytically intractable, positively constrained Beta parameters. We applied our algorithm to simulated and real biological datasets and demonstrated significant improvement in detection performance and robustness over exomePeak. Prediction results on publicly available MeRIP-seq datasets are also validated and shown to be able to recapitulate the known patterns of m6A, further validating the improved performance of MeTPeak.

Original languageEnglish (US)
Pages (from-to)i378-i385
JournalBioinformatics
Volume32
Issue number12
DOIs
StatePublished - Jun 15 2016

Fingerprint

RNA Sequence Analysis
RNA
Immunoprecipitation
Messenger RNA
Sequencing
Technology
Modeling
Barrier Function
Methylation
Adenosine
Prediction
Gene Expression Profiling
Graphical Models
Newton-Raphson method
Hidden Markov models
Profiling
Transcriptome
Newton Methods
Markov Model
Simplify

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability

Cite this

A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data. / Cui, Xiaodong; Meng, Jia; Zhang, Shaowu; Chen, Yidong; Huang, Yufei.

In: Bioinformatics, Vol. 32, No. 12, 15.06.2016, p. i378-i385.

Research output: Contribution to journalArticle

Cui, Xiaodong ; Meng, Jia ; Zhang, Shaowu ; Chen, Yidong ; Huang, Yufei. / A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data. In: Bioinformatics. 2016 ; Vol. 32, No. 12. pp. i378-i385.
@article{d0a21b244560412091cbbb80798d762c,
title = "A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data",
abstract = "Motivation: N6-methyl-adenosine (m6A) is the most prevalent mRNA methylation but precise prediction of its mRNA location is important for understanding its function. A recent sequencing technology, known as Methylated RNA Immunoprecipitation Sequencing technology (MeRIP-seq), has been developed for transcriptome-wide profiling of m6A. We previously developed a peak calling algorithm called exomePeak. However, exomePeak over-simplifies data characteristics and ignores the reads' variances among replicates or reads dependency across a site region. To further improve the performance, new model is needed to address these important issues of MeRIP-seq data. Results: We propose a novel, graphical model-based peak calling method, MeTPeak, for transcriptome-wide detection of m6A sites from MeRIP-seq data. MeTPeak explicitly models read count of an m6A site and introduces a hierarchical layer of Beta variables to capture the variances and a Hidden Markov model to characterize the reads dependency across a site. In addition, we developed a constrained Newton's method and designed a log-barrier function to compute analytically intractable, positively constrained Beta parameters. We applied our algorithm to simulated and real biological datasets and demonstrated significant improvement in detection performance and robustness over exomePeak. Prediction results on publicly available MeRIP-seq datasets are also validated and shown to be able to recapitulate the known patterns of m6A, further validating the improved performance of MeTPeak.",
author = "Xiaodong Cui and Jia Meng and Shaowu Zhang and Yidong Chen and Yufei Huang",
year = "2016",
month = "6",
day = "15",
doi = "10.1093/bioinformatics/btw281",
language = "English (US)",
volume = "32",
pages = "i378--i385",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "12",

}

TY - JOUR

T1 - A novel algorithm for calling mRNA m6A peaks by modeling biological variances in MeRIP-seq data

AU - Cui, Xiaodong

AU - Meng, Jia

AU - Zhang, Shaowu

AU - Chen, Yidong

AU - Huang, Yufei

PY - 2016/6/15

Y1 - 2016/6/15

N2 - Motivation: N6-methyl-adenosine (m6A) is the most prevalent mRNA methylation but precise prediction of its mRNA location is important for understanding its function. A recent sequencing technology, known as Methylated RNA Immunoprecipitation Sequencing technology (MeRIP-seq), has been developed for transcriptome-wide profiling of m6A. We previously developed a peak calling algorithm called exomePeak. However, exomePeak over-simplifies data characteristics and ignores the reads' variances among replicates or reads dependency across a site region. To further improve the performance, new model is needed to address these important issues of MeRIP-seq data. Results: We propose a novel, graphical model-based peak calling method, MeTPeak, for transcriptome-wide detection of m6A sites from MeRIP-seq data. MeTPeak explicitly models read count of an m6A site and introduces a hierarchical layer of Beta variables to capture the variances and a Hidden Markov model to characterize the reads dependency across a site. In addition, we developed a constrained Newton's method and designed a log-barrier function to compute analytically intractable, positively constrained Beta parameters. We applied our algorithm to simulated and real biological datasets and demonstrated significant improvement in detection performance and robustness over exomePeak. Prediction results on publicly available MeRIP-seq datasets are also validated and shown to be able to recapitulate the known patterns of m6A, further validating the improved performance of MeTPeak.

AB - Motivation: N6-methyl-adenosine (m6A) is the most prevalent mRNA methylation but precise prediction of its mRNA location is important for understanding its function. A recent sequencing technology, known as Methylated RNA Immunoprecipitation Sequencing technology (MeRIP-seq), has been developed for transcriptome-wide profiling of m6A. We previously developed a peak calling algorithm called exomePeak. However, exomePeak over-simplifies data characteristics and ignores the reads' variances among replicates or reads dependency across a site region. To further improve the performance, new model is needed to address these important issues of MeRIP-seq data. Results: We propose a novel, graphical model-based peak calling method, MeTPeak, for transcriptome-wide detection of m6A sites from MeRIP-seq data. MeTPeak explicitly models read count of an m6A site and introduces a hierarchical layer of Beta variables to capture the variances and a Hidden Markov model to characterize the reads dependency across a site. In addition, we developed a constrained Newton's method and designed a log-barrier function to compute analytically intractable, positively constrained Beta parameters. We applied our algorithm to simulated and real biological datasets and demonstrated significant improvement in detection performance and robustness over exomePeak. Prediction results on publicly available MeRIP-seq datasets are also validated and shown to be able to recapitulate the known patterns of m6A, further validating the improved performance of MeTPeak.

UR - http://www.scopus.com/inward/record.url?scp=84976467345&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84976467345&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btw281

DO - 10.1093/bioinformatics/btw281

M3 - Article

VL - 32

SP - i378-i385

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 12

ER -