A new method for mapping short DNA sequencing reads by using quality scores

Hatice Gulcin Ozer, Terry Camerlengo, Hui-ming Huang, Kun Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

New high-throughput sequencing technologies can generate millions of short DNA sequences that need to be mapped to the reference genome accurately. Majority of the mapping algorithms handle variations in the quality of these short sequences by allowing more mismatches and/or gaps in the alignment and focus to improve runtime. In this paper, we investigate ways to classify quality scores of short DNA sequencing reads and integrate them into the mapping process. We specifically studied the quality scores that suggest two alternate bases (the top quality scores for two bases are close to each other at the locus) and use of such bases to improve mapping accuracy. Our method includes generation of alternative sequences when there are alternate-quality bases in a sequence read and mapping of these alternative sequences to the reference genome. In a test using a piece of ChIP-seq data from epigenetic study, we generated and mapped alternatives of 222,755 sequence reads (out of the original 2.5 million reads) that cannot be mapped to the reference genome by the Eland algorithm. With this approach we could be able to map 12.8% of these sequence reads with alternative bases to unique positions in the genome. In this study, we demonstrate that use of alternative bases in mapping algorithms can improve mapping results dramatically.

Original languageEnglish (US)
Title of host publicationOCCBIO'09: 2009 Ohio Collaborative Conference on Bioinformatics
Pages21-25
Number of pages5
DOIs
StatePublished - 2009
Externally publishedYes
Event2009 Ohio Collaborative Conference on Bioinformatics, OCCBIO 2009 - Cleveland, OH, United States
Duration: Jun 15 2009Jun 17 2009

Other

Other2009 Ohio Collaborative Conference on Bioinformatics, OCCBIO 2009
CountryUnited States
CityCleveland, OH
Period6/15/096/17/09

Fingerprint

DNA Sequence Analysis
DNA
Genome
Genes
Epigenomics
DNA sequences
Technology
Throughput

Keywords

  • DNA sequencing
  • Quality score
  • Short sequence mapping

ASJC Scopus subject areas

  • Biomedical Engineering
  • Health Informatics

Cite this

Ozer, H. G., Camerlengo, T., Huang, H., & Huang, K. (2009). A new method for mapping short DNA sequencing reads by using quality scores. In OCCBIO'09: 2009 Ohio Collaborative Conference on Bioinformatics (pp. 21-25). [5159155] https://doi.org/10.1109/OCCBIO.2009.35

A new method for mapping short DNA sequencing reads by using quality scores. / Ozer, Hatice Gulcin; Camerlengo, Terry; Huang, Hui-ming; Huang, Kun.

OCCBIO'09: 2009 Ohio Collaborative Conference on Bioinformatics. 2009. p. 21-25 5159155.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Ozer, HG, Camerlengo, T, Huang, H & Huang, K 2009, A new method for mapping short DNA sequencing reads by using quality scores. in OCCBIO'09: 2009 Ohio Collaborative Conference on Bioinformatics., 5159155, pp. 21-25, 2009 Ohio Collaborative Conference on Bioinformatics, OCCBIO 2009, Cleveland, OH, United States, 6/15/09. https://doi.org/10.1109/OCCBIO.2009.35
Ozer HG, Camerlengo T, Huang H, Huang K. A new method for mapping short DNA sequencing reads by using quality scores. In OCCBIO'09: 2009 Ohio Collaborative Conference on Bioinformatics. 2009. p. 21-25. 5159155 https://doi.org/10.1109/OCCBIO.2009.35
Ozer, Hatice Gulcin ; Camerlengo, Terry ; Huang, Hui-ming ; Huang, Kun. / A new method for mapping short DNA sequencing reads by using quality scores. OCCBIO'09: 2009 Ohio Collaborative Conference on Bioinformatics. 2009. pp. 21-25
@inproceedings{0bf3c4b8a813461eba376a73af3ddca0,
title = "A new method for mapping short DNA sequencing reads by using quality scores",
abstract = "New high-throughput sequencing technologies can generate millions of short DNA sequences that need to be mapped to the reference genome accurately. Majority of the mapping algorithms handle variations in the quality of these short sequences by allowing more mismatches and/or gaps in the alignment and focus to improve runtime. In this paper, we investigate ways to classify quality scores of short DNA sequencing reads and integrate them into the mapping process. We specifically studied the quality scores that suggest two alternate bases (the top quality scores for two bases are close to each other at the locus) and use of such bases to improve mapping accuracy. Our method includes generation of alternative sequences when there are alternate-quality bases in a sequence read and mapping of these alternative sequences to the reference genome. In a test using a piece of ChIP-seq data from epigenetic study, we generated and mapped alternatives of 222,755 sequence reads (out of the original 2.5 million reads) that cannot be mapped to the reference genome by the Eland algorithm. With this approach we could be able to map 12.8{\%} of these sequence reads with alternative bases to unique positions in the genome. In this study, we demonstrate that use of alternative bases in mapping algorithms can improve mapping results dramatically.",
keywords = "DNA sequencing, Quality score, Short sequence mapping",
author = "Ozer, {Hatice Gulcin} and Terry Camerlengo and Hui-ming Huang and Kun Huang",
year = "2009",
doi = "10.1109/OCCBIO.2009.35",
language = "English (US)",
isbn = "9780769536859",
pages = "21--25",
booktitle = "OCCBIO'09: 2009 Ohio Collaborative Conference on Bioinformatics",

}

TY - GEN

T1 - A new method for mapping short DNA sequencing reads by using quality scores

AU - Ozer, Hatice Gulcin

AU - Camerlengo, Terry

AU - Huang, Hui-ming

AU - Huang, Kun

PY - 2009

Y1 - 2009

N2 - New high-throughput sequencing technologies can generate millions of short DNA sequences that need to be mapped to the reference genome accurately. Majority of the mapping algorithms handle variations in the quality of these short sequences by allowing more mismatches and/or gaps in the alignment and focus to improve runtime. In this paper, we investigate ways to classify quality scores of short DNA sequencing reads and integrate them into the mapping process. We specifically studied the quality scores that suggest two alternate bases (the top quality scores for two bases are close to each other at the locus) and use of such bases to improve mapping accuracy. Our method includes generation of alternative sequences when there are alternate-quality bases in a sequence read and mapping of these alternative sequences to the reference genome. In a test using a piece of ChIP-seq data from epigenetic study, we generated and mapped alternatives of 222,755 sequence reads (out of the original 2.5 million reads) that cannot be mapped to the reference genome by the Eland algorithm. With this approach we could be able to map 12.8% of these sequence reads with alternative bases to unique positions in the genome. In this study, we demonstrate that use of alternative bases in mapping algorithms can improve mapping results dramatically.

AB - New high-throughput sequencing technologies can generate millions of short DNA sequences that need to be mapped to the reference genome accurately. Majority of the mapping algorithms handle variations in the quality of these short sequences by allowing more mismatches and/or gaps in the alignment and focus to improve runtime. In this paper, we investigate ways to classify quality scores of short DNA sequencing reads and integrate them into the mapping process. We specifically studied the quality scores that suggest two alternate bases (the top quality scores for two bases are close to each other at the locus) and use of such bases to improve mapping accuracy. Our method includes generation of alternative sequences when there are alternate-quality bases in a sequence read and mapping of these alternative sequences to the reference genome. In a test using a piece of ChIP-seq data from epigenetic study, we generated and mapped alternatives of 222,755 sequence reads (out of the original 2.5 million reads) that cannot be mapped to the reference genome by the Eland algorithm. With this approach we could be able to map 12.8% of these sequence reads with alternative bases to unique positions in the genome. In this study, we demonstrate that use of alternative bases in mapping algorithms can improve mapping results dramatically.

KW - DNA sequencing

KW - Quality score

KW - Short sequence mapping

UR - http://www.scopus.com/inward/record.url?scp=70350407500&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70350407500&partnerID=8YFLogxK

U2 - 10.1109/OCCBIO.2009.35

DO - 10.1109/OCCBIO.2009.35

M3 - Conference contribution

AN - SCOPUS:70350407500

SN - 9780769536859

SP - 21

EP - 25

BT - OCCBIO'09: 2009 Ohio Collaborative Conference on Bioinformatics

ER -