TY - JOUR
T1 - Benchmarking relatedness inference methods with genome-wide data from thousands of relatives
AU - Ramstetter, Monica D.
AU - Dyer, Thomas D.
AU - Lehman, Donna M.
AU - Curran, Joanne E.
AU - Duggirala, Ravindranath
AU - Blangero, John
AU - Mezey, Jason G.
AU - Williams, Amy L.
N1 - Funding Information:
We thank the San Antonio Mexican American Family Study participants that made this analysis possible. We also thank Shai Carmi for helpful comments. This work was supported by a National Science Foundation Graduate Research Fellowship grant number DGE-1144153 to M.D.R.; Qatar National Research Fund grant NPRP 7-1425-3-370 to J.G.M.; and an Alfred P. Sloan Research Fellowship and a seed grant from Nancy and Peter Meinig to A.L.W. The SAMAFS are supported by NIH grants R01 HL0113323, P01 HL045222, R01 DK047482, and R01 DK053889.
Publisher Copyright:
© 2017 Ramstetter et al.
PY - 2017
Y1 - 2017
N2 - Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92–99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for.76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance.
AB - Inferring relatedness from genomic data is an essential component of genetic association studies, population genetics, forensics, and genealogy. While numerous methods exist for inferring relatedness, thorough evaluation of these approaches in real data has been lacking. Here, we report an assessment of 12 state-of-the-art pairwise relatedness inference methods using a data set with 2485 individuals contained in several large pedigrees that span up to six generations. We find that all methods have high accuracy (92–99%) when detecting first- and second-degree relationships, but their accuracy dwindles to <43% for seventh-degree relationships. However, most identical by descent (IBD) segment-based methods inferred seventh-degree relatives correct to within one relatedness degree for.76% of relative pairs. Overall, the most accurate methods are Estimation of Recent Shared Ancestry (ERSA) and approaches that compute total IBD sharing using the output from GERMLINE and Refined IBD to infer relatedness. Combining information from the most accurate methods provides little accuracy improvement, indicating that novel approaches, such as new methods that leverage relatedness signals from multiple samples, are needed to achieve a sizeable jump in performance.
KW - Admixture
KW - Identical by descent
KW - Relatedness estimation
UR - http://www.scopus.com/inward/record.url?scp=85028958051&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85028958051&partnerID=8YFLogxK
U2 - 10.1534/genetics.117.1122
DO - 10.1534/genetics.117.1122
M3 - Article
C2 - 28739658
AN - SCOPUS:85028958051
SN - 0016-6731
VL - 207
SP - 75
EP - 82
JO - Genetics
JF - Genetics
IS - 1
ER -