LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data

Rui Wang, Hang Kai Hsu, Adam Blattler, Yisong Wang, Xun Lan, Yao Wang, Pei Yin Hsu, Yu Wei Leu, Hui-ming Huang, Peggy J. Farnham, Victor X Jin

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.

Original languageEnglish (US)
Article numbere67788
JournalPLoS One
Volume8
Issue number6
DOIs
StatePublished - Jun 25 2013

Fingerprint

binding sites
Binding Sites
Genes
genome
Statistical Models
Human Genome
Linear Models
Polynomials
Genome
Polymerase Chain Reaction
Datasets

ASJC Scopus subject areas

  • Agricultural and Biological Sciences(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Medicine(all)

Cite this

LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data. / Wang, Rui; Hsu, Hang Kai; Blattler, Adam; Wang, Yisong; Lan, Xun; Wang, Yao; Hsu, Pei Yin; Leu, Yu Wei; Huang, Hui-ming; Farnham, Peggy J.; Jin, Victor X.

In: PLoS One, Vol. 8, No. 6, e67788, 25.06.2013.

Research output: Contribution to journalArticle

Wang, R, Hsu, HK, Blattler, A, Wang, Y, Lan, X, Wang, Y, Hsu, PY, Leu, YW, Huang, H, Farnham, PJ & Jin, VX 2013, 'LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data', PLoS One, vol. 8, no. 6, e67788. https://doi.org/10.1371/journal.pone.0067788
Wang, Rui ; Hsu, Hang Kai ; Blattler, Adam ; Wang, Yisong ; Lan, Xun ; Wang, Yao ; Hsu, Pei Yin ; Leu, Yu Wei ; Huang, Hui-ming ; Farnham, Peggy J. ; Jin, Victor X. / LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data. In: PLoS One. 2013 ; Vol. 8, No. 6.
@article{5cbf002259c447b08330eaeda325152f,
title = "LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data",
abstract = "One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60{\%} of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.",
author = "Rui Wang and Hsu, {Hang Kai} and Adam Blattler and Yisong Wang and Xun Lan and Yao Wang and Hsu, {Pei Yin} and Leu, {Yu Wei} and Hui-ming Huang and Farnham, {Peggy J.} and Jin, {Victor X}",
year = "2013",
month = "6",
day = "25",
doi = "10.1371/journal.pone.0067788",
language = "English (US)",
volume = "8",
journal = "PLoS One",
issn = "1932-6203",
publisher = "Public Library of Science",
number = "6",

}

TY - JOUR

T1 - LOcating Non-Unique matched Tags (LONUT) to Improve the Detection of the Enriched Regions for ChIP-seq Data

AU - Wang, Rui

AU - Hsu, Hang Kai

AU - Blattler, Adam

AU - Wang, Yisong

AU - Lan, Xun

AU - Wang, Yao

AU - Hsu, Pei Yin

AU - Leu, Yu Wei

AU - Huang, Hui-ming

AU - Farnham, Peggy J.

AU - Jin, Victor X

PY - 2013/6/25

Y1 - 2013/6/25

N2 - One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.

AB - One big limitation of computational tools for analyzing ChIP-seq data is that most of them ignore non-unique tags (NUTs) that match the human genome even though NUTs comprise up to 60% of all raw tags in ChIP-seq data. Effectively utilizing these NUTs would increase the sequencing depth and allow a more accurate detection of enriched binding sites, which in turn could lead to more precise and significant biological interpretations. In this study, we have developed a computational tool, LOcating Non-Unique matched Tags (LONUT), to improve the detection of enriched regions from ChIP-seq data. Our LONUT algorithm applies a linear and polynomial regression model to establish an empirical score (ES) formula by considering two influential factors, the distance of NUTs to peaks identified using uniquely matched tags (UMTs) and the enrichment score for those peaks resulting in each NUT being assigned to a unique location on the reference genome. The newly located tags from the set of NUTs are combined with the original UMTs to produce a final set of combined matched tags (CMTs). LONUT was tested on many different datasets representing three different characteristics of biological data types. The detected sites were validated using de novo motif discovery and ChIP-PCR. We demonstrate the specificity and accuracy of LONUT and show that our program not only improves the detection of binding sites for ChIP-seq, but also identifies additional binding sites.

UR - http://www.scopus.com/inward/record.url?scp=84879382729&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84879382729&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0067788

DO - 10.1371/journal.pone.0067788

M3 - Article

C2 - 23825685

AN - SCOPUS:84879382729

VL - 8

JO - PLoS One

JF - PLoS One

SN - 1932-6203

IS - 6

M1 - e67788

ER -