Methods for optimizing the prediction of Escherichia coli RNA polymerase promoter sequences by neural networks are presented. A neural network was trained on a set of 80 known promoter sequences combined with different numbers of random sequences. The conserved -10 region and -35 region of the promoter sequences and a combination of these regions were used in three independent training sets. The prediction accuracy of the resulting weight matrix was tested against a separate set of 30 known promoter sequences and 1500 random sequences. The effects of the network's topology, the extent of training, the number of random sequences in the training set and the effects of different data representations were examined and optimized. Accuracies of 100% on the promoter test set and 98.4% on the random test set were achieved with the optimal parameters.
ASJC Scopus subject areas