Integrative genome-wide chromatin signature analysis using finite mixture models.

Cenny Taslim, Shili Lin, Kun Huang, Hui-ming Huang

Research output: Contribution to journalArticle

Abstract

Regulation of gene expression has been shown to involve not only the binding of transcription factor at target gene promoters but also the characterization of histone around which DNA is wrapped around. Some histone modification, for example di-methylated histone H3 at lysine 4 (H3K4me2), has been shown to bind to promoters and activate target genes. However, no clear pattern has been shown to predict human promoters. This paper proposed a novel quantitative approach to characterize patterns of promoter regions and predict novel and alternative promoters. We utilized high-throughput data generated using chromatin immunoprecipitation methods followed by massively parallel sequencing (ChIP-seq) technology on RNA Polymerase II (Pol-II) and H3K4me2. Common patterns of promoter regions are modeled using a mixture model involving double-exponential and uniform distributions. The fitted model obtained were then used to search for regions displaying similar patterns over the entire genome to find novel and alternative promoters. Regions with high correlations with the common patterns are identified as putative novel promoters. We used this proposed algorithm, RNA-seq data and several transcripts databases to find alternative promoters in MCF7 (normal breast cancer) cell line. We found 7,235 high-confidence regions that display the identified promoter patterns. Of these, 4,167 regions (58%) can be mapped to RefSeq regions. 2,444 regions are in a gene body or overlap with transcripts (non-coding RNAs, ESTs, and transcripts that are predicted by RNA-seq data). Some of these maybe potential alternative promoters. We also found 193 regions that map to enhancer regions (represented by androgen and estrogen receptor binding sites) and other regulatory regions such as CTCF (CCCTC binding factor) and CpG island. Around 5% (431 regions) of these correlated regions do not overlap with any transcripts or regulatory regions suggesting that these might be potential new promoters or markers for other annotation which are currently undiscovered.

Original languageEnglish (US)
Article numberS3
JournalBMC Genomics
Volume13 Suppl 6
StatePublished - 2012
Externally publishedYes

Fingerprint

Chromatin
Nucleic Acid Regulatory Sequences
Genome
Genetic Promoter Regions
Histones
Histone Code
RNA
Genes
High-Throughput Nucleotide Sequencing
Untranslated RNA
CpG Islands
RNA Polymerase II
Chromatin Immunoprecipitation
Expressed Sequence Tags
Androgen Receptors
Gene Expression Regulation
Estrogen Receptors
Lysine
Transcription Factors
Binding Sites

ASJC Scopus subject areas

  • Biotechnology
  • Genetics

Cite this

Integrative genome-wide chromatin signature analysis using finite mixture models. / Taslim, Cenny; Lin, Shili; Huang, Kun; Huang, Hui-ming.

In: BMC Genomics, Vol. 13 Suppl 6, S3, 2012.

Research output: Contribution to journalArticle

@article{f559df77544948fc8e766ce2e079f857,
title = "Integrative genome-wide chromatin signature analysis using finite mixture models.",
abstract = "Regulation of gene expression has been shown to involve not only the binding of transcription factor at target gene promoters but also the characterization of histone around which DNA is wrapped around. Some histone modification, for example di-methylated histone H3 at lysine 4 (H3K4me2), has been shown to bind to promoters and activate target genes. However, no clear pattern has been shown to predict human promoters. This paper proposed a novel quantitative approach to characterize patterns of promoter regions and predict novel and alternative promoters. We utilized high-throughput data generated using chromatin immunoprecipitation methods followed by massively parallel sequencing (ChIP-seq) technology on RNA Polymerase II (Pol-II) and H3K4me2. Common patterns of promoter regions are modeled using a mixture model involving double-exponential and uniform distributions. The fitted model obtained were then used to search for regions displaying similar patterns over the entire genome to find novel and alternative promoters. Regions with high correlations with the common patterns are identified as putative novel promoters. We used this proposed algorithm, RNA-seq data and several transcripts databases to find alternative promoters in MCF7 (normal breast cancer) cell line. We found 7,235 high-confidence regions that display the identified promoter patterns. Of these, 4,167 regions (58{\%}) can be mapped to RefSeq regions. 2,444 regions are in a gene body or overlap with transcripts (non-coding RNAs, ESTs, and transcripts that are predicted by RNA-seq data). Some of these maybe potential alternative promoters. We also found 193 regions that map to enhancer regions (represented by androgen and estrogen receptor binding sites) and other regulatory regions such as CTCF (CCCTC binding factor) and CpG island. Around 5{\%} (431 regions) of these correlated regions do not overlap with any transcripts or regulatory regions suggesting that these might be potential new promoters or markers for other annotation which are currently undiscovered.",
author = "Cenny Taslim and Shili Lin and Kun Huang and Hui-ming Huang",
year = "2012",
language = "English (US)",
volume = "13 Suppl 6",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Integrative genome-wide chromatin signature analysis using finite mixture models.

AU - Taslim, Cenny

AU - Lin, Shili

AU - Huang, Kun

AU - Huang, Hui-ming

PY - 2012

Y1 - 2012

N2 - Regulation of gene expression has been shown to involve not only the binding of transcription factor at target gene promoters but also the characterization of histone around which DNA is wrapped around. Some histone modification, for example di-methylated histone H3 at lysine 4 (H3K4me2), has been shown to bind to promoters and activate target genes. However, no clear pattern has been shown to predict human promoters. This paper proposed a novel quantitative approach to characterize patterns of promoter regions and predict novel and alternative promoters. We utilized high-throughput data generated using chromatin immunoprecipitation methods followed by massively parallel sequencing (ChIP-seq) technology on RNA Polymerase II (Pol-II) and H3K4me2. Common patterns of promoter regions are modeled using a mixture model involving double-exponential and uniform distributions. The fitted model obtained were then used to search for regions displaying similar patterns over the entire genome to find novel and alternative promoters. Regions with high correlations with the common patterns are identified as putative novel promoters. We used this proposed algorithm, RNA-seq data and several transcripts databases to find alternative promoters in MCF7 (normal breast cancer) cell line. We found 7,235 high-confidence regions that display the identified promoter patterns. Of these, 4,167 regions (58%) can be mapped to RefSeq regions. 2,444 regions are in a gene body or overlap with transcripts (non-coding RNAs, ESTs, and transcripts that are predicted by RNA-seq data). Some of these maybe potential alternative promoters. We also found 193 regions that map to enhancer regions (represented by androgen and estrogen receptor binding sites) and other regulatory regions such as CTCF (CCCTC binding factor) and CpG island. Around 5% (431 regions) of these correlated regions do not overlap with any transcripts or regulatory regions suggesting that these might be potential new promoters or markers for other annotation which are currently undiscovered.

AB - Regulation of gene expression has been shown to involve not only the binding of transcription factor at target gene promoters but also the characterization of histone around which DNA is wrapped around. Some histone modification, for example di-methylated histone H3 at lysine 4 (H3K4me2), has been shown to bind to promoters and activate target genes. However, no clear pattern has been shown to predict human promoters. This paper proposed a novel quantitative approach to characterize patterns of promoter regions and predict novel and alternative promoters. We utilized high-throughput data generated using chromatin immunoprecipitation methods followed by massively parallel sequencing (ChIP-seq) technology on RNA Polymerase II (Pol-II) and H3K4me2. Common patterns of promoter regions are modeled using a mixture model involving double-exponential and uniform distributions. The fitted model obtained were then used to search for regions displaying similar patterns over the entire genome to find novel and alternative promoters. Regions with high correlations with the common patterns are identified as putative novel promoters. We used this proposed algorithm, RNA-seq data and several transcripts databases to find alternative promoters in MCF7 (normal breast cancer) cell line. We found 7,235 high-confidence regions that display the identified promoter patterns. Of these, 4,167 regions (58%) can be mapped to RefSeq regions. 2,444 regions are in a gene body or overlap with transcripts (non-coding RNAs, ESTs, and transcripts that are predicted by RNA-seq data). Some of these maybe potential alternative promoters. We also found 193 regions that map to enhancer regions (represented by androgen and estrogen receptor binding sites) and other regulatory regions such as CTCF (CCCTC binding factor) and CpG island. Around 5% (431 regions) of these correlated regions do not overlap with any transcripts or regulatory regions suggesting that these might be potential new promoters or markers for other annotation which are currently undiscovered.

UR - http://www.scopus.com/inward/record.url?scp=84876085912&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84876085912&partnerID=8YFLogxK

M3 - Article

C2 - 23134707

AN - SCOPUS:84876085912

VL - 13 Suppl 6

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

M1 - S3

ER -