Large upward bias in estimation of locus-specific effects from genomewide scans

H. H H Göring, J. D. Terwilliger, J. Blangero

Research output: Contribution to journalArticle

372 Citations (Scopus)

Abstract

Original languageEnglish
Pages (from-to)1357-1369
Number of pages13
JournalAmerican Journal of Human Genetics
Volume69
Issue number6
DOIs
StatePublished - 2001
Externally publishedYes

Fingerprint

Genes
Genome
Datasets

ASJC Scopus subject areas

  • Genetics

Cite this

Large upward bias in estimation of locus-specific effects from genomewide scans. / Göring, H. H H; Terwilliger, J. D.; Blangero, J.

In: American Journal of Human Genetics, Vol. 69, No. 6, 2001, p. 1357-1369.

Research output: Contribution to journalArticle

Göring, H. H H ; Terwilliger, J. D. ; Blangero, J. / Large upward bias in estimation of locus-specific effects from genomewide scans. In: American Journal of Human Genetics. 2001 ; Vol. 69, No. 6. pp. 1357-1369.
@article{34e0e72b09c04bc99d958307f911dfa2,
title = "Large upward bias in estimation of locus-specific effects from genomewide scans",
abstract = "The primary goal of a genomewide scan is to estimate the genomic locations of genes influencing a trait of interest. It is sometimes said that a secondary goal is to estimate the phenotypic effects of each identified locus. Here, it is shown that these two objectives cannot be met reliably by use of a single data set of a currently realistic size. Simulation and analytical results, based on variance-components linkage analysis as an example, demonstrate that estimates of locus-specific effect size at genomewide LOD score peaks tend to be grossly inflated and can even be virtually independent of the true effect size, even for studies on large samples when the true effect size is small. However, the bias diminishes asymptotically. The explanation for the bias is that the LOD score is a function of the locus-specific effect-size estimate, such that there is a high correlation between the observed statistical significance and the effect-size estimate. When the LOD score is maximized over the many pointwise tests being conducted throughout the genome, the locus-specific effect-size estimate is therefore effectively maximized as well. We argue that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect - and then only if one does not condition on statistical significance being obtained. We further show that the same factors causing this bias are responsible for frequent failures to replicate initial claims of linkage or association for complex traits, even when the initial localization is, in fact, correct. The findings of this study have wide-ranging implications, as they apply to all statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.",
author = "G{\"o}ring, {H. H H} and Terwilliger, {J. D.} and J. Blangero",
year = "2001",
doi = "10.1086/324471",
language = "English",
volume = "69",
pages = "1357--1369",
journal = "American Journal of Human Genetics",
issn = "0002-9297",
publisher = "Cell Press",
number = "6",

}

TY - JOUR

T1 - Large upward bias in estimation of locus-specific effects from genomewide scans

AU - Göring, H. H H

AU - Terwilliger, J. D.

AU - Blangero, J.

PY - 2001

Y1 - 2001

N2 - The primary goal of a genomewide scan is to estimate the genomic locations of genes influencing a trait of interest. It is sometimes said that a secondary goal is to estimate the phenotypic effects of each identified locus. Here, it is shown that these two objectives cannot be met reliably by use of a single data set of a currently realistic size. Simulation and analytical results, based on variance-components linkage analysis as an example, demonstrate that estimates of locus-specific effect size at genomewide LOD score peaks tend to be grossly inflated and can even be virtually independent of the true effect size, even for studies on large samples when the true effect size is small. However, the bias diminishes asymptotically. The explanation for the bias is that the LOD score is a function of the locus-specific effect-size estimate, such that there is a high correlation between the observed statistical significance and the effect-size estimate. When the LOD score is maximized over the many pointwise tests being conducted throughout the genome, the locus-specific effect-size estimate is therefore effectively maximized as well. We argue that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect - and then only if one does not condition on statistical significance being obtained. We further show that the same factors causing this bias are responsible for frequent failures to replicate initial claims of linkage or association for complex traits, even when the initial localization is, in fact, correct. The findings of this study have wide-ranging implications, as they apply to all statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.

AB - The primary goal of a genomewide scan is to estimate the genomic locations of genes influencing a trait of interest. It is sometimes said that a secondary goal is to estimate the phenotypic effects of each identified locus. Here, it is shown that these two objectives cannot be met reliably by use of a single data set of a currently realistic size. Simulation and analytical results, based on variance-components linkage analysis as an example, demonstrate that estimates of locus-specific effect size at genomewide LOD score peaks tend to be grossly inflated and can even be virtually independent of the true effect size, even for studies on large samples when the true effect size is small. However, the bias diminishes asymptotically. The explanation for the bias is that the LOD score is a function of the locus-specific effect-size estimate, such that there is a high correlation between the observed statistical significance and the effect-size estimate. When the LOD score is maximized over the many pointwise tests being conducted throughout the genome, the locus-specific effect-size estimate is therefore effectively maximized as well. We argue that attempts at bias correction give unsatisfactory results, and that pointwise estimation in an independent data set may be the only way of obtaining reliable estimates of locus-specific effect - and then only if one does not condition on statistical significance being obtained. We further show that the same factors causing this bias are responsible for frequent failures to replicate initial claims of linkage or association for complex traits, even when the initial localization is, in fact, correct. The findings of this study have wide-ranging implications, as they apply to all statistical methods of gene localization. It is hoped that, by keeping this bias in mind, we will more realistically interpret and extrapolate from the results of genomewide scans.

UR - http://www.scopus.com/inward/record.url?scp=0035209177&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0035209177&partnerID=8YFLogxK

U2 - 10.1086/324471

DO - 10.1086/324471

M3 - Article

VL - 69

SP - 1357

EP - 1369

JO - American Journal of Human Genetics

JF - American Journal of Human Genetics

SN - 0002-9297

IS - 6

ER -