Evaluation of methods for modeling transcription factor sequence specificity

Matthew T. Weirauch, Atina Cote, Raquel Norel, Matti Annala, Yue Zhao, Todd R. Riley, Julio Saez-Rodriguez, Thomas Cokelaer, Anastasia Vedenko, Shaheynoor Talukder, Harmen J. Bussemaker, Morris D. Quaid, Martha L. Bulyk, Gustavo Stolovitzky, Timothy R. Hughes, Phaedra Agius, Aaron Arvey, Philipp Bucher, Curtis G. Callan, Cheng Wei ChangChien Yu Chen, Yong Syuan Chen, Yu Wei Chu, Jan Grau, Ivo Grosse, Vidhya Jagannathan, Jens Keilwagen, Szymon M. Kiebasa, Justin B. Kinney, Holger Klein, Miron B. Kursa, Harri Lähdesmäki, Kirsti Laurila, Chengwei Lei, Christina Leslie, Chaim Linhart, Anand Murugan, Alena Myšičková, William Stafford Noble, Matti Nykter, Yaron Orenstein, Stefan Posch, Jianhua Ruan, Witold R. Rudnicki, Christoph D. Schmid, Ron Shamir, Wing Kin Sung, Martin Vingron, Zhizhuo Zhang

Research output: Contribution to journalArticle

196 Scopus citations

Abstract

Genomic analyses often involve scanning for potential transcription factor (TF) binding sites using models of the sequence specificity of DNA binding proteins. Many approaches have been developed to model and learn a protein's DNA-binding specificity, but these methods have not been systematically compared. Here we applied 26 such approaches to in vitro protein binding microarray data for 66 mouse TFs belonging to various families. For nine TFs, we also scored the resulting motif models on in vivo data, and found that the best in vitro-derived motifs performed similarly to motifs derived from the in vivo data. Our results indicate that simple models based on mononucleotide position weight matrices trained by the best methods perform similarly to more complex models for most TFs examined, but fall short in specific cases (<10% of the TFs examined here). In addition, the best-performing motifs typically have relatively low information content, consistent with widespread degeneracy in eukaryotic TF sequence preferences.

Original languageEnglish (US)
Pages (from-to)126-134
Number of pages9
JournalNature Biotechnology
Volume31
Issue number2
DOIs
StatePublished - Feb 2013
Externally publishedYes

ASJC Scopus subject areas

  • Biotechnology
  • Bioengineering
  • Applied Microbiology and Biotechnology
  • Molecular Medicine
  • Biomedical Engineering

Fingerprint Dive into the research topics of 'Evaluation of methods for modeling transcription factor sequence specificity'. Together they form a unique fingerprint.

  • Cite this

    Weirauch, M. T., Cote, A., Norel, R., Annala, M., Zhao, Y., Riley, T. R., Saez-Rodriguez, J., Cokelaer, T., Vedenko, A., Talukder, S., Bussemaker, H. J., Quaid, M. D., Bulyk, M. L., Stolovitzky, G., Hughes, T. R., Agius, P., Arvey, A., Bucher, P., Callan, C. G., ... Zhang, Z. (2013). Evaluation of methods for modeling transcription factor sequence specificity. Nature Biotechnology, 31(2), 126-134. https://doi.org/10.1038/nbt.2486