Background: The advancement of the next-generation sequencing technology enables mapping gene expression at the single-cell level, capable of tracking cell heterogeneity and determination of cell subpopulations using single-cell RNA sequencing (scRNA-seq). Unlike the objectives of conventional RNA-seq where differential expression analysis is the integral component, the most important goal of scRNA-seq is to identify highly variable genes across a population of cells, to account for the discrete nature of single-cell gene expression and uniqueness of sequencing library preparation protocol for single-cell sequencing. However, there is lack of generic expression variation model for different scRNA-seq data sets. Hence, the objective of this study is to develop a gene expression variation model (GEVM), utilizing the relationship between coefficient of variation (CV) and average expression level to address the over-dispersion of single-cell data, and its corresponding statistical significance to quantify the variably expressed genes (VEGs). Results: We have built a simulation framework that generated scRNA-seq data with different number of cells, model parameters, and variation levels. We implemented our GEVM and demonstrated the robustness by using a set of simulated scRNA-seq data under different conditions. We evaluated the regression robustness using root-mean-square error (RMSE) and assessed the parameter estimation process by varying initial model parameters that deviated from homogeneous cell population. We also applied the GEVM on real scRNA-seq data to test the performance under distinct cases. Conclusions: In this paper, we proposed a gene expression variation model that can be used to determine significant variably expressed genes. Applying the model to the simulated single-cell data, we observed robust parameter estimation under different conditions with minimal root mean square errors. We also examined the model on two distinct scRNA-seq data sets using different single-cell protocols and determined the VEGs. Obtaining VEGs allowed us to observe possible subpopulations, providing further evidences of cell heterogeneity. With the GEVM, we can easily find out significant variably expressed genes in different scRNA-seq data sets.
- Cell heterogeneity
- Gene expression variation model
- Negative binomial distribution
- Single-cell RNA-Seq
- Variably expressed genes
ASJC Scopus subject areas