TY - JOUR
T1 - Genotype phasing in pedigrees using whole-genome sequence data
AU - Blackburn, August N.
AU - Blondell, Lucy
AU - Kos, Mark Z.
AU - Blackburn, Nicholas B.
AU - Peralta, Juan M.
AU - Stevens, Peter T.
AU - Lehman, Donna M.
AU - Blangero, John C
AU - Göring, Harald H.H.
N1 - Funding Information:
Acknowledgements This work was funded in part by NIH grant R01 DK099051 (to HHHG) and conducted in part in facilities constructed with the support of NIH grant C06 RR020547. The SAMAFS whole-genome sequence data were obtained as part of the T2D-GENES Consortium, supported by NIH grants U01 DK085524, U01 DK085584, U01 DK085501, U01 DK085526, and U01 DK085545. Pedigree data were provided by the San Antonio Family Heart Study and San Antonio Family Diabetes/Gallbladder Study, supported by NIH grants R01 HL0113323, P01 HL045222, R01 DK047482, and R01 DK053889. We are grateful to the anonymous reviewers for their constructive reviews.
Publisher Copyright:
© 2020, The Author(s), under exclusive licence to European Society of Human Genetics.
PY - 2020/6/1
Y1 - 2020/6/1
N2 - Phasing is the process of inferring haplotypes from genotype data. Efficient algorithms and associated software for accurate phasing in pedigrees are needed, especially for populations lacking reference panels of sequenced individuals. We present a novel method for phasing genotypes from whole-genome sequence data in pedigrees, called PULSAR (Phasing Using Lineage Specific Alleles/Rare variants). The method is based on the property that alleles specific to a single founding chromosome within a pedigree are highly informative for identifying haplotypes that are shared identical by descent. Simulation studies are used to assess the performance of PULSAR with various pedigree sizes and structures, and the effect of genotyping errors and the presence of nonsequenced individuals is investigated. In pedigrees with complete sequencing and realistic genotyping error rates, PULSAR correctly phases >99.9% of heterozygous genotypes, excluding sites at which all individuals are heterozygous, and does so with a switch error rate frequently below 10−4. PULSAR is highly accurate, capable of genotype error correction and imputation, and computationally competitive with alternative phasing software applicable to pedigrees. Our method has the significant advantage of not requiring reference panels that are essential for other population-based phasing algorithms. A software implementation of PULSAR is freely available.
AB - Phasing is the process of inferring haplotypes from genotype data. Efficient algorithms and associated software for accurate phasing in pedigrees are needed, especially for populations lacking reference panels of sequenced individuals. We present a novel method for phasing genotypes from whole-genome sequence data in pedigrees, called PULSAR (Phasing Using Lineage Specific Alleles/Rare variants). The method is based on the property that alleles specific to a single founding chromosome within a pedigree are highly informative for identifying haplotypes that are shared identical by descent. Simulation studies are used to assess the performance of PULSAR with various pedigree sizes and structures, and the effect of genotyping errors and the presence of nonsequenced individuals is investigated. In pedigrees with complete sequencing and realistic genotyping error rates, PULSAR correctly phases >99.9% of heterozygous genotypes, excluding sites at which all individuals are heterozygous, and does so with a switch error rate frequently below 10−4. PULSAR is highly accurate, capable of genotype error correction and imputation, and computationally competitive with alternative phasing software applicable to pedigrees. Our method has the significant advantage of not requiring reference panels that are essential for other population-based phasing algorithms. A software implementation of PULSAR is freely available.
UR - http://www.scopus.com/inward/record.url?scp=85078675004&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85078675004&partnerID=8YFLogxK
U2 - 10.1038/s41431-020-0574-3
DO - 10.1038/s41431-020-0574-3
M3 - Article
C2 - 31996801
AN - SCOPUS:85078675004
VL - 28
SP - 790
EP - 803
JO - European Journal of Human Genetics
JF - European Journal of Human Genetics
SN - 1018-4813
IS - 6
ER -