TY - GEN
T1 - Finding gapped motifs by a novel evolutionary algorithm
AU - Lei, Chengwei
AU - Ruan, Jianhua
N1 - Copyright:
Copyright 2021 Elsevier B.V., All rights reserved.
PY - 2010
Y1 - 2010
N2 - Identifying approximately repeated patterns, or motifs, in biological sequences from a set of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. In this work, we develop a novel motif finding algorithm based on a population-based stochastic optimization technique called Particle Swarm Optimization (PSO), which has been shown to be effective in optimizing difficult multidimensional problems in continuous domains. We propose a modification of the standard PSO algorithm to handle discrete values, such as characters in DNA sequences. Our algorithm also provides several unique features. First, we use both consensus and position-specific weight matrix representations in our algorithm, taking advantage of the efficiency of the former and the accuracy of the later. Furthermore, many real motifs contain gaps, but the existing methods usually ignore them or assume a user know their exact locations and lengths, which is usually impractical for real applications. In comparison, our method models gaps explicitly, and provides an easy solution to find gapped motifs without any detailed knowledge of gaps. Our method also allows some input sequences to contain zero or multiple binding sites. Experimental results on synthetic challenge problems as well as real biological sequences show that our method is both more efficient and more accurate than several existing algorithms, especially when gaps are present in the motifs.
AB - Identifying approximately repeated patterns, or motifs, in biological sequences from a set of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. In this work, we develop a novel motif finding algorithm based on a population-based stochastic optimization technique called Particle Swarm Optimization (PSO), which has been shown to be effective in optimizing difficult multidimensional problems in continuous domains. We propose a modification of the standard PSO algorithm to handle discrete values, such as characters in DNA sequences. Our algorithm also provides several unique features. First, we use both consensus and position-specific weight matrix representations in our algorithm, taking advantage of the efficiency of the former and the accuracy of the later. Furthermore, many real motifs contain gaps, but the existing methods usually ignore them or assume a user know their exact locations and lengths, which is usually impractical for real applications. In comparison, our method models gaps explicitly, and provides an easy solution to find gapped motifs without any detailed knowledge of gaps. Our method also allows some input sequences to contain zero or multiple binding sites. Experimental results on synthetic challenge problems as well as real biological sequences show that our method is both more efficient and more accurate than several existing algorithms, especially when gaps are present in the motifs.
KW - DNA motif
KW - Evolutionary algorithm
KW - Optimization
KW - PSO
UR - http://www.scopus.com/inward/record.url?scp=77952304352&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77952304352&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-12211-8_5
DO - 10.1007/978-3-642-12211-8_5
M3 - Conference contribution
AN - SCOPUS:77952304352
SN - 3642122108
SN - 9783642122101
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 50
EP - 61
BT - Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics - 8th European Conference, EvoBIO 2010, Proceedings
PB - Springer Verlag
T2 - 8th European Conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics, EvoBIO 2010
Y2 - 7 April 2010 through 9 April 2010
ER -