Independent test assessment using the extreme value distribution theory

Marcio Almeida, Lucy Blondell, Juan M. Peralta, Jack W. Kent, Goo Jun, Tanya M. Teslovich, Christian Fuchsberger, Andrew R. Wood, Alisa K. Manning, Timothy M. Frayling, Pablo E. Cingolani, Robert Sladek, Thomas D. Dyer, Goncalo Abecasis, Ravindranath Duggirala, John Blangero

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.

Original languageEnglish (US)
Article number63
JournalBMC Proceedings
Volume10
DOIs
StatePublished - 2016
Externally publishedYes

Fingerprint

Genes
Genome
Genome-Wide Association Study
Blood Pressure
Education
Microtubule-Associated Proteins
Calcium Channels
Pedigree
Blood pressure
Single Nucleotide Polymorphism
Genotype
Hypertension
Phenotype
Population
Polymorphism
Nucleotides
Testing

ASJC Scopus subject areas

  • Medicine(all)
  • Biochemistry, Genetics and Molecular Biology(all)

Cite this

Almeida, M., Blondell, L., Peralta, J. M., Kent, J. W., Jun, G., Teslovich, T. M., ... Blangero, J. (2016). Independent test assessment using the extreme value distribution theory. BMC Proceedings, 10, [63]. https://doi.org/10.1186/s12919-016-0038-5

Independent test assessment using the extreme value distribution theory. / Almeida, Marcio; Blondell, Lucy; Peralta, Juan M.; Kent, Jack W.; Jun, Goo; Teslovich, Tanya M.; Fuchsberger, Christian; Wood, Andrew R.; Manning, Alisa K.; Frayling, Timothy M.; Cingolani, Pablo E.; Sladek, Robert; Dyer, Thomas D.; Abecasis, Goncalo; Duggirala, Ravindranath; Blangero, John.

In: BMC Proceedings, Vol. 10, 63, 2016.

Research output: Contribution to journalArticle

Almeida, M, Blondell, L, Peralta, JM, Kent, JW, Jun, G, Teslovich, TM, Fuchsberger, C, Wood, AR, Manning, AK, Frayling, TM, Cingolani, PE, Sladek, R, Dyer, TD, Abecasis, G, Duggirala, R & Blangero, J 2016, 'Independent test assessment using the extreme value distribution theory', BMC Proceedings, vol. 10, 63. https://doi.org/10.1186/s12919-016-0038-5
Almeida M, Blondell L, Peralta JM, Kent JW, Jun G, Teslovich TM et al. Independent test assessment using the extreme value distribution theory. BMC Proceedings. 2016;10. 63. https://doi.org/10.1186/s12919-016-0038-5
Almeida, Marcio ; Blondell, Lucy ; Peralta, Juan M. ; Kent, Jack W. ; Jun, Goo ; Teslovich, Tanya M. ; Fuchsberger, Christian ; Wood, Andrew R. ; Manning, Alisa K. ; Frayling, Timothy M. ; Cingolani, Pablo E. ; Sladek, Robert ; Dyer, Thomas D. ; Abecasis, Goncalo ; Duggirala, Ravindranath ; Blangero, John. / Independent test assessment using the extreme value distribution theory. In: BMC Proceedings. 2016 ; Vol. 10.
@article{0befbcea778c49e389d7ee010e6bf938,
title = "Independent test assessment using the extreme value distribution theory",
abstract = "The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a na{\"i}ve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 {\%} of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.",
author = "Marcio Almeida and Lucy Blondell and Peralta, {Juan M.} and Kent, {Jack W.} and Goo Jun and Teslovich, {Tanya M.} and Christian Fuchsberger and Wood, {Andrew R.} and Manning, {Alisa K.} and Frayling, {Timothy M.} and Cingolani, {Pablo E.} and Robert Sladek and Dyer, {Thomas D.} and Goncalo Abecasis and Ravindranath Duggirala and John Blangero",
year = "2016",
doi = "10.1186/s12919-016-0038-5",
language = "English (US)",
volume = "10",
journal = "BMC Proceedings",
issn = "1753-6561",
publisher = "BioMed Central",

}

TY - JOUR

T1 - Independent test assessment using the extreme value distribution theory

AU - Almeida, Marcio

AU - Blondell, Lucy

AU - Peralta, Juan M.

AU - Kent, Jack W.

AU - Jun, Goo

AU - Teslovich, Tanya M.

AU - Fuchsberger, Christian

AU - Wood, Andrew R.

AU - Manning, Alisa K.

AU - Frayling, Timothy M.

AU - Cingolani, Pablo E.

AU - Sladek, Robert

AU - Dyer, Thomas D.

AU - Abecasis, Goncalo

AU - Duggirala, Ravindranath

AU - Blangero, John

PY - 2016

Y1 - 2016

N2 - The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.

AB - The new generation of whole genome sequencing platforms offers great possibilities and challenges for dissecting the genetic basis of complex traits. With a very high number of sequence variants, a naïve multiple hypothesis threshold correction hinders the identification of reliable associations by the overreduction of statistical power. In this report, we examine 2 alternative approaches to improve the statistical power of a whole genome association study to detect reliable genetic associations. The approaches were tested using the Genetic Analysis Workshop 19 (GAW19) whole genome sequencing data. The first tested method estimates the real number of effective independent tests actually being performed in whole genome association project by the use of an extreme value distribution and a set of phenotype simulations. Given the familiar nature of the GAW19 data and the finite number of pedigree founders in the sample, the number of correlations between genotypes is greater than in a set of unrelated samples. Using our procedure, we estimate that the effective number represents only 15 % of the total number of independent tests performed. However, even using this corrected significance threshold, no genome-wide significant association could be detected for systolic and diastolic blood pressure traits. The second approach implements a biological relevance-driven hypothesis tested by exploiting prior computational predictions on the effect of nonsynonymous genetic variants detected in a whole genome sequencing association study. This guided testing approach was able to identify 2 promising single-nucleotide polymorphisms (SNPs), 1 for each trait, targeting biologically relevant genes that could help shed light on the genesis of the human hypertension. The first gene, PFH14, associated with systolic blood pressure, interacts directly with genes involved in calcium-channel formation and the second gene, MAP4, encodes a microtubule-associated protein and had already been detected by previous genome-wide association study experiments conducted in an Asian population. Our results highlight the necessity of the development of alternative approached to improve the efficiency on the detection of reasonable candidate associations in whole genome sequencing studies.

UR - http://www.scopus.com/inward/record.url?scp=85016042147&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85016042147&partnerID=8YFLogxK

U2 - 10.1186/s12919-016-0038-5

DO - 10.1186/s12919-016-0038-5

M3 - Article

VL - 10

JO - BMC Proceedings

JF - BMC Proceedings

SN - 1753-6561

M1 - 63

ER -