Differential expression analysis with global network adjustment

Jonathan A Gelfond, Joseph G. Ibrahim, Mayetri Gupta, Ming Hui Chen, Jannine D Cody

Research output: Contribution to journalArticle

Abstract

Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene's expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient " over-shrinkage" method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.

Original languageEnglish (US)
Article number258
JournalBMC Bioinformatics
Volume14
Issue number1
DOIs
StatePublished - Aug 23 2013

Fingerprint

Differential Expression
Adjustment
Genes
Gene
Gene expression
Gene Expression
Transcriptional Regulation
Shrinkage
Proportion
Biased Estimation
Databases
Penalized Regression
Regulatory Networks
Signal-To-Noise Ratio
Ridge
Transcriptome
Cross-validation
Regulator
Deletion
Experiment

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Applied Mathematics
  • Structural Biology

Cite this

Differential expression analysis with global network adjustment. / Gelfond, Jonathan A; Ibrahim, Joseph G.; Gupta, Mayetri; Chen, Ming Hui; Cody, Jannine D.

In: BMC Bioinformatics, Vol. 14, No. 1, 258, 23.08.2013.

Research output: Contribution to journalArticle

Gelfond, Jonathan A ; Ibrahim, Joseph G. ; Gupta, Mayetri ; Chen, Ming Hui ; Cody, Jannine D. / Differential expression analysis with global network adjustment. In: BMC Bioinformatics. 2013 ; Vol. 14, No. 1.
@article{0eb0da05238e418980ba90b67a788769,
title = "Differential expression analysis with global network adjustment",
abstract = "Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene's expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient {"} over-shrinkage{"} method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25{\%}, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.",
author = "Gelfond, {Jonathan A} and Ibrahim, {Joseph G.} and Mayetri Gupta and Chen, {Ming Hui} and Cody, {Jannine D}",
year = "2013",
month = "8",
day = "23",
doi = "10.1186/1471-2105-14-258",
language = "English (US)",
volume = "14",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Differential expression analysis with global network adjustment

AU - Gelfond, Jonathan A

AU - Ibrahim, Joseph G.

AU - Gupta, Mayetri

AU - Chen, Ming Hui

AU - Cody, Jannine D

PY - 2013/8/23

Y1 - 2013/8/23

N2 - Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene's expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient " over-shrinkage" method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.

AB - Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene's expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient " over-shrinkage" method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.

UR - http://www.scopus.com/inward/record.url?scp=84882492534&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84882492534&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-14-258

DO - 10.1186/1471-2105-14-258

M3 - Article

C2 - 23968143

AN - SCOPUS:84882492534

VL - 14

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

IS - 1

M1 - 258

ER -