A wealth of computational methods has been developed to address problems in systems biology, such as modeling gene expression. However, to objectively evaluate and compare such methods is notoriously difficult. The DREAM (Dialogue on Reverse Engineering Assessments and Methods) project is a community-wide effort to assess the relative strengths and weaknesses of different computational methods for a set of core problems in systems biology. This article presents a top-performing algorithm for one of the challenge problems in the third annual DREAM (DREAM3), namely the gene expression prediction challenge. In this challenge, participants are asked to predict the expression levels of a small set of genes in a yeast deletion strain, given the expression levels of all other genes in the same strain and complete gene expression data for several other yeast strains. I propose a simple k-nearest-neighbor (KNN) method to solve this problem. Despite its simplicity, this method works well for this challenge, sharing the "top performer" honor with a much more sophisticated method. I also describe several alternative, simple strategies, including a modified KNN algorithm that further improves the performance of the standard KNN method. The success of these methods suggests that complex methods attempting to integrate multiple data sets do not necessarily lead to better performance than simple yet robust methods. Furthermore, none of these top-performing methods, including the one by a different team, are based on gene regulatory networks, which seems to suggest that accurately modeling gene expression using gene regulatory networks is unfortunately still a difficult task.
ASJC Scopus subject areas
- Biochemistry, Genetics and Molecular Biology(all)
- Agricultural and Biological Sciences(all)