The enrichment of duplicate genes, and therefore paralogs (proteins coded by duplicate genes), in multicellular versus unicellular organisms enhances genomic functional innovation. This study quantitatively examined relationships among paralog enrichment, expression pattern diversification and multicellularity, aiming to better understand genomic basis of multicellularity. Paralog abundance in specific cells was compared with those in unicellular proteomes and the whole proteomes of multicellular organisms. The budding yeast, Saccharomyces cerevisiae and the nematode, Caenorhabditis elegans, for which the gene sets expressed in specific cells are available, were used as uni and multicellular models, respectively. Paralog count (K) distributions [P (k)] follow a power-law relationship [P(k) k -α] in the whole proteomes of both species and in specific C. elegans cells. The value of the constant α can be used as a gauge of paralog abundance; the higher the value, the lower the paralog abundance. The α-value is indeed lower in the whole proteome of C. elegans (1.74) than in S. cerevisiae (2.34), quantifying the enrichment of paralogs in multicellular species. We also found that the power-law relationship applies to the proteomes of specific C. elegans cells. Strikingly, values of α in specific cells are higher and comparable to that in S. cerevisiae. Thus, paralog abundance in specific cells is lower and comparable to that in unicellular species. Furthermore, how much the expression level of a gene fluctuates across different C. elegans cells correlates positively with its paralog count, which is further confirmed by human gene-expression patterns across different tissues. Taken together, these results quantitatively and mechanistically establish enrichment of paralogs with diversifying expression patterns as genomic and evolutionary basis of multicellularity.
ASJC Scopus subject areas