Reproducibility, Sources of Variability, Pooling, and Sample Size: Important Considerations for the Design of High-Density Oligonucleotide Array Experiments

Eun Soo Han, Yimin Wu, Roger McCarter, James F Nelson, Arlan Richardson, Susan G. Hilsenbeck

Research output: Contribution to journalArticle

57 Citations (Scopus)

Abstract

We have undertaken a series of experiments to examine several issues that directly affect design of gene expression studies using Affymetrix GeneChip arrays: probe-level analysis, need for technical replication, relative contribution of various sources of variability, and utility of pooling RNA from different samples. Probe-level data were analyzed by Affymetrix MAS 5.0, and three model-based methods, PM-MM and PM-only models by dChip, and the RMA model by Bioconductor, with the latter two providing the best performance. We found that replicate chips of the same RNA have limited value in reducing total variability, and for relatively highly expressed genes in this biologically homogeneous animal model of aging, about 11% of total variation is due to day effects and the remainder is approximately equally split between sample and residual sources. We also found that pooling samples is neither advantageous nor detrimental. Finally we suggest a strategy for sample size calculations using formulas appropriate when coefficients of variation are known, target effects are expressed as fold changes, and data can be assumed to be approximately lognormally distributed.

Original languageEnglish (US)
Pages (from-to)306-315
Number of pages10
JournalJournals of Gerontology - Series A Biological Sciences and Medical Sciences
Volume59
Issue number4
StatePublished - Apr 2004

Fingerprint

Oligonucleotide Array Sequence Analysis
Sample Size
RNA
Animal Models
Gene Expression
Genes

ASJC Scopus subject areas

  • Aging

Cite this

Reproducibility, Sources of Variability, Pooling, and Sample Size : Important Considerations for the Design of High-Density Oligonucleotide Array Experiments. / Han, Eun Soo; Wu, Yimin; McCarter, Roger; Nelson, James F; Richardson, Arlan; Hilsenbeck, Susan G.

In: Journals of Gerontology - Series A Biological Sciences and Medical Sciences, Vol. 59, No. 4, 04.2004, p. 306-315.

Research output: Contribution to journalArticle

@article{8aacdf17efb4469e881a4266d2d72d0e,
title = "Reproducibility, Sources of Variability, Pooling, and Sample Size: Important Considerations for the Design of High-Density Oligonucleotide Array Experiments",
abstract = "We have undertaken a series of experiments to examine several issues that directly affect design of gene expression studies using Affymetrix GeneChip arrays: probe-level analysis, need for technical replication, relative contribution of various sources of variability, and utility of pooling RNA from different samples. Probe-level data were analyzed by Affymetrix MAS 5.0, and three model-based methods, PM-MM and PM-only models by dChip, and the RMA model by Bioconductor, with the latter two providing the best performance. We found that replicate chips of the same RNA have limited value in reducing total variability, and for relatively highly expressed genes in this biologically homogeneous animal model of aging, about 11{\%} of total variation is due to day effects and the remainder is approximately equally split between sample and residual sources. We also found that pooling samples is neither advantageous nor detrimental. Finally we suggest a strategy for sample size calculations using formulas appropriate when coefficients of variation are known, target effects are expressed as fold changes, and data can be assumed to be approximately lognormally distributed.",
author = "Han, {Eun Soo} and Yimin Wu and Roger McCarter and Nelson, {James F} and Arlan Richardson and Hilsenbeck, {Susan G.}",
year = "2004",
month = "4",
language = "English (US)",
volume = "59",
pages = "306--315",
journal = "Journals of Gerontology - Series A Biological Sciences and Medical Sciences",
issn = "1079-5006",
publisher = "Oxford University Press",
number = "4",

}

TY - JOUR

T1 - Reproducibility, Sources of Variability, Pooling, and Sample Size

T2 - Important Considerations for the Design of High-Density Oligonucleotide Array Experiments

AU - Han, Eun Soo

AU - Wu, Yimin

AU - McCarter, Roger

AU - Nelson, James F

AU - Richardson, Arlan

AU - Hilsenbeck, Susan G.

PY - 2004/4

Y1 - 2004/4

N2 - We have undertaken a series of experiments to examine several issues that directly affect design of gene expression studies using Affymetrix GeneChip arrays: probe-level analysis, need for technical replication, relative contribution of various sources of variability, and utility of pooling RNA from different samples. Probe-level data were analyzed by Affymetrix MAS 5.0, and three model-based methods, PM-MM and PM-only models by dChip, and the RMA model by Bioconductor, with the latter two providing the best performance. We found that replicate chips of the same RNA have limited value in reducing total variability, and for relatively highly expressed genes in this biologically homogeneous animal model of aging, about 11% of total variation is due to day effects and the remainder is approximately equally split between sample and residual sources. We also found that pooling samples is neither advantageous nor detrimental. Finally we suggest a strategy for sample size calculations using formulas appropriate when coefficients of variation are known, target effects are expressed as fold changes, and data can be assumed to be approximately lognormally distributed.

AB - We have undertaken a series of experiments to examine several issues that directly affect design of gene expression studies using Affymetrix GeneChip arrays: probe-level analysis, need for technical replication, relative contribution of various sources of variability, and utility of pooling RNA from different samples. Probe-level data were analyzed by Affymetrix MAS 5.0, and three model-based methods, PM-MM and PM-only models by dChip, and the RMA model by Bioconductor, with the latter two providing the best performance. We found that replicate chips of the same RNA have limited value in reducing total variability, and for relatively highly expressed genes in this biologically homogeneous animal model of aging, about 11% of total variation is due to day effects and the remainder is approximately equally split between sample and residual sources. We also found that pooling samples is neither advantageous nor detrimental. Finally we suggest a strategy for sample size calculations using formulas appropriate when coefficients of variation are known, target effects are expressed as fold changes, and data can be assumed to be approximately lognormally distributed.

UR - http://www.scopus.com/inward/record.url?scp=1842680967&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1842680967&partnerID=8YFLogxK

M3 - Article

C2 - 15071073

AN - SCOPUS:1842680967

VL - 59

SP - 306

EP - 315

JO - Journals of Gerontology - Series A Biological Sciences and Medical Sciences

JF - Journals of Gerontology - Series A Biological Sciences and Medical Sciences

SN - 1079-5006

IS - 4

ER -