TY - GEN
T1 - A particle swarm optimization algorithm for finding DNA sequence motifs
AU - Lei, Chengwei
AU - Ruan, Jianhua
PY - 2008
Y1 - 2008
N2 - Discovering short DNA motifs from a set of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in both computer science and molecular biology. In this work, we propose a novel motif finding algorithm based on a population-based stochastic optimization technique called Particle Swarm Optimization (PSO), which has been shown to be effective in optimizing difficult multidimensional problems in many fields. However, PSO has mainly been applied to problems in continuous domains. The motif finding problem, which is essentially a multiple local alignment problem, is discrete, as a slight shift in one sequence completely changes the alignment. Therefore, we propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space, which transforms the motif finding problem into a contiguous integer domain, and propose a modification of the naive PSO algorithm to accommodate integer variables. In order to improve efficiency, we also propose several strategies for escaping from local optima, and determining the termination criteria automatically. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most successful existing algorithms.
AB - Discovering short DNA motifs from a set of co-regulated genes is an important step towards deciphering the complex gene regulatory networks and understanding gene functions. Despite significant improvement in the last decade, it still remains one of the most challenging problems in both computer science and molecular biology. In this work, we propose a novel motif finding algorithm based on a population-based stochastic optimization technique called Particle Swarm Optimization (PSO), which has been shown to be effective in optimizing difficult multidimensional problems in many fields. However, PSO has mainly been applied to problems in continuous domains. The motif finding problem, which is essentially a multiple local alignment problem, is discrete, as a slight shift in one sequence completely changes the alignment. Therefore, we propose to use a word dissimilarity graph to remap the neighborhood structure of the solution space, which transforms the motif finding problem into a contiguous integer domain, and propose a modification of the naive PSO algorithm to accommodate integer variables. In order to improve efficiency, we also propose several strategies for escaping from local optima, and determining the termination criteria automatically. Experimental results on simulated challenge problems show that our method is both more efficient and more accurate than several existing algorithms. Applications to several sets of real promoter sequences also show that our approach is able to detect known transcription factor binding sites, and outperforms two of the most successful existing algorithms.
UR - http://www.scopus.com/inward/record.url?scp=58049164266&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=58049164266&partnerID=8YFLogxK
U2 - 10.1109/BIBMW.2008.4686231
DO - 10.1109/BIBMW.2008.4686231
M3 - Conference contribution
AN - SCOPUS:58049164266
SN - 9781424428908
T3 - Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
SP - 166
EP - 173
BT - Proceedings - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
T2 - 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, BIBMW
Y2 - 3 November 2008 through 5 November 2008
ER -