TY - JOUR
T1 - Fast Genome-Wide QTL Association Mapping on Pedigree and Population Data
AU - Zhou, Hua
AU - Blangero, John
AU - Dyer, Thomas D.
AU - Chan, Kei hang K.
AU - Lange, Kenneth
AU - Sobel, Eric M.
N1 - Funding Information:
The authors gratefully acknowledge the NIH grants GM053275 (E.M.S. and K.L.), HG006139 (H.Z., E.M.S., and K.L.), MH059490 (J.B., T.D.D., E.M.S., and K.L.), and GM105785 (H.Z.) and NSF grant DMS1310319 (H.Z.) supporting this research. K.K.C. also gratefully acknowledges the fellowship support from the Burroughs Wellcome Fund Inter-school Training Program in Metabolic Diseases.
Publisher Copyright:
© 2016 WILEY PERIODICALS, INC.
PY - 2017/4/1
Y1 - 2017/4/1
N2 - Since most analysis software for genome-wide association studies (GWAS) currently exploit only unrelated individuals, there is a need for efficient applications that can handle general pedigree data or mixtures of both population and pedigree data. Even datasets thought to consist of only unrelated individuals may include cryptic relationships that can lead to false positives if not discovered and controlled for. In addition, family designs possess compelling advantages. They are better equipped to detect rare variants, control for population stratification, and facilitate the study of parent-of-origin effects. Pedigrees selected for extreme trait values often segregate a single gene with strong effect. Finally, many pedigrees are available as an important legacy from the era of linkage analysis. Unfortunately, pedigree likelihoods are notoriously hard to compute. In this paper, we reexamine the computational bottlenecks and implement ultra-fast pedigree-based GWAS analysis. Kinship coefficients can either be based on explicitly provided pedigrees or automatically estimated from dense markers. Our strategy (a) works for random sample data, pedigree data, or a mix of both; (b) entails no loss of power; (c) allows for any number of covariate adjustments, including correction for population stratification; (d) allows for testing SNPs under additive, dominant, and recessive models; and (e) accommodates both univariate and multivariate quantitative traits. On a typical personal computer (six CPU cores at 2.67 GHz), analyzing a univariate HDL (high-density lipoprotein) trait from the San Antonio Family Heart Study (935,392 SNPs on 1,388 individuals in 124 pedigrees) takes less than 2 min and 1.5 GB of memory. Complete multivariate QTL analysis of the three time-points of the longitudinal HDL multivariate trait takes less than 5 min and 1.5 GB of memory. The algorithm is implemented as the Ped-GWAS Analysis (Option 29) in the Mendel statistical genetics package, which is freely available for Macintosh, Linux, and Windows platforms from http://genetics.ucla.edu/software/mendel.
AB - Since most analysis software for genome-wide association studies (GWAS) currently exploit only unrelated individuals, there is a need for efficient applications that can handle general pedigree data or mixtures of both population and pedigree data. Even datasets thought to consist of only unrelated individuals may include cryptic relationships that can lead to false positives if not discovered and controlled for. In addition, family designs possess compelling advantages. They are better equipped to detect rare variants, control for population stratification, and facilitate the study of parent-of-origin effects. Pedigrees selected for extreme trait values often segregate a single gene with strong effect. Finally, many pedigrees are available as an important legacy from the era of linkage analysis. Unfortunately, pedigree likelihoods are notoriously hard to compute. In this paper, we reexamine the computational bottlenecks and implement ultra-fast pedigree-based GWAS analysis. Kinship coefficients can either be based on explicitly provided pedigrees or automatically estimated from dense markers. Our strategy (a) works for random sample data, pedigree data, or a mix of both; (b) entails no loss of power; (c) allows for any number of covariate adjustments, including correction for population stratification; (d) allows for testing SNPs under additive, dominant, and recessive models; and (e) accommodates both univariate and multivariate quantitative traits. On a typical personal computer (six CPU cores at 2.67 GHz), analyzing a univariate HDL (high-density lipoprotein) trait from the San Antonio Family Heart Study (935,392 SNPs on 1,388 individuals in 124 pedigrees) takes less than 2 min and 1.5 GB of memory. Complete multivariate QTL analysis of the three time-points of the longitudinal HDL multivariate trait takes less than 5 min and 1.5 GB of memory. The algorithm is implemented as the Ped-GWAS Analysis (Option 29) in the Mendel statistical genetics package, which is freely available for Macintosh, Linux, and Windows platforms from http://genetics.ucla.edu/software/mendel.
KW - fixed-effects models
KW - genome-wide association study
KW - kinship
KW - multivariate traits
KW - pedigree
KW - score test
UR - http://www.scopus.com/inward/record.url?scp=85014561188&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85014561188&partnerID=8YFLogxK
U2 - 10.1002/gepi.21988
DO - 10.1002/gepi.21988
M3 - Article
C2 - 27943406
AN - SCOPUS:85014561188
SN - 0741-0395
VL - 41
SP - 174
EP - 186
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 3
ER -