New high-throughput sequencing technologies can generate millions of short sequences in a single experiment. As the size of the data increases, comparison of multiple experiments on different cell lines under different experimental conditions becomes a big challenge. In this paper, we investigate ways to compare multiple ChIP-sequencing experiments. We specifically studied epigenetic regulation of breast cancer and the effect of estrogen using 50 ChIP-sequencing data from Illumina Genome Analyzer II. First, we evaluate the correlation among different experiments focusing on the total number of reads in transcribed and promoter regions of the genome. Then, we adopt the method that is used to identify the most stable genes in RT-PCR experiments to understand background signal across all of the experiments and to identify the most variable transcribed and promoter regions of the genome. We observed that the most variable genes for transcribed regions and promoter regions are very distinct. Gene ontology and function enrichment analysis on these most variable genes demonstrate the biological relevance of the results. In this study, we present a method that can effectively select differential regions of the genome based on protein-binding profiles over multiple experiments using real data points without any normalization among the samples.
|Original language||English (US)|
|Number of pages||14|
|Journal||Journal of bioinformatics and computational biology|
|State||Published - Apr 1 2011|
ASJC Scopus subject areas
- Molecular Biology
- Computer Science Applications