Imputation of missing genotypes from sparse to high density using long-range phasing

Daetwyler, Hans D., Wiggans, George R., Hayes, Ben J., Woolliams, John A. and Goddard, Mike E. (2011) Imputation of missing genotypes from sparse to high density using long-range phasing. Genetics, 189 1: 317-327. doi:10.1534/genetics.111.128082

Author Daetwyler, Hans D.
Wiggans, George R.
Hayes, Ben J.
Woolliams, John A.
Goddard, Mike E.
Title Imputation of missing genotypes from sparse to high density using long-range phasing
Journal name Genetics   Check publisher's open access policy
ISSN 0016-6731
Publication date 2011-09-01
Sub-type Article (original research)
DOI 10.1534/genetics.111.128082
Open Access Status Not yet assessed
Volume 189
Issue 1
Start page 317
End page 327
Total pages 11
Place of publication Bethesda, MD, United States
Publisher Genetics Society of America
Language eng
Abstract Related individuals share potentially long chromosome segments that trace to a common ancestor. We describe a phasing algorithm (Chromo Phase) that utilizes this characteristic of finite populations to phase large sections of a chromosome. In addition to phasing, our method imputes missing genotypes in individuals genotyped at lower marker density when more densely genotyped relatives are available. Chromo Phase uses a pedigree to collect an individual's (the proband) surrogate parents and offspring and uses genotypic similarity to identify its genomic surrogates. The algorithm then cycles through the relatives and genomic surrogates one at a time to find shared chromosome segments. Once a segment has been identified, any missing information in the proband is filled in with information from the relative. We tested Chromo Phase in a simulated population consisting of 400 individuals at a marker density of 1500/M, which is approximately equivalent to a 50K bovine single nucleotide polymorphism chip. In simulated data, 99.9% loci were correctly phased and, when imputing from 100 to 1500 markers, more than 87% of missing genotypes were correctly imputed. Performance increased when the number of generations available in the pedigree increased, but was reduced when the sparse genotype contained fewer loci. However, in simulated data, Chromo Phase correctly imputed at least 12% more genotypes than fast PHASE, depending on sparse marker density. We also tested the algorithm in a real Holstein cattle data set to impute 50K genotypes in animals with a sparse 3K genotype. In these data 92% of genotypes were correctly imputed in animals with a genotyped sire. We evaluated the accuracy of genomic predictions with the dense, sparse, and imputed simulated data sets and show that the reduction in genomic evaluation accuracy is modest even with imperfectly imputed genotype data. Our results demonstrate that imputation of missing genotypes, and potentially full genome sequence, using long-range phasing is feasible.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: Queensland Alliance for Agriculture and Food Innovation
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 32 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 37 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Fri, 05 Aug 2016, 09:58:07 EST by System User on behalf of Learning and Research Services (UQ Library)