Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle

Khatkar, Mehar S., Moser, Gerhard, Hayes, Ben J. and Raadsma, Herman W. (2012) Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle. BMC Genomics, 13 1-12. doi:10.1186/1471-2164-13-538

Author Khatkar, Mehar S.
Moser, Gerhard
Hayes, Ben J.
Raadsma, Herman W.
Title Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle
Journal name BMC Genomics   Check publisher's open access policy
ISSN 1471-2164
Publication date 2012-10-08
Year available 2012
Sub-type Article (original research)
DOI 10.1186/1471-2164-13-538
Open Access Status DOI
Volume 13
Start page 1
End page 12
Total pages 12
Place of publication London, United Kingdom
Publisher BioMed Central
Language eng
Formatted abstract
Background: We investigated strategies and factors affecting accuracy of imputing genotypes from lower-density SNP panels (Illumina 3K, 7K, Affymetrix 15K and 25K, and evenly spaced subsets) up to one medium (Illumina 50K) and one high-density (Illumina 800K) SNP panel. We also evaluated the utility of imputed genotypes on the accuracy of genomic selection using Australian Holstein-Friesian cattle data from 2727 and 845 animals genotyped with 50K and 800K SNP chip, respectively. Animals were divided into reference and test sets (genotyped with higher and lower density SNP panels, respectively) for evaluating the accuracies of imputation. For the accuracy of genomic selection, a comparison of direct genetic values (DGV) was made by dividing the data into training and validation sets under a range of imputation scenarios.
Results: Of the three methods compared for imputation, IMPUTE2 outperformed Beagle and fastPhase for almost all scenarios. Higher SNP densities in the test animals, larger reference sets and higher relatedness between test and reference animals increased the accuracy of imputation. 50K specific genotypes were imputed with moderate allelic error rates from 15K (2.85%) and 25K (2.75%) genotypes. Using IMPUTE2, SNP genotypes up to 800K were imputed with low allelic error rate (0.79% genome-wide) from 50K genotypes, and with moderate error rate from 3K (4.78%) and 7K (2.00%) genotypes. The error rate of imputing up to 800K from 3K or 7K was further reduced when an additional middle tier of 50K genotypes was incorporated in a 3-tiered framework. Accuracies of DGV for five production traits using imputed 50K genotypes were close to those obtained with the actual 50K genotypes and higher compared to using 3K or 7K genotypes. The loss in accuracy of DGV was small when most of the training animals also had imputed (50K) genotypes. Additional gains in DGV accuracies were small when SNP densities increased from 50K to imputed 800K.
Conclusion: Population-based genotype imputation can be used to predict and combine genotypes from different low, medium and high-density SNP chips with a high level of accuracy. Imputing genotypes from low-density SNP panels to at least 50K SNP density increases the accuracy of genomic selection.

Keyword Imputation
High-density SNP
Dairy cattle
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Non HERDC
UQ Diamantina Institute Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 38 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 37 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sat, 16 Feb 2013, 06:01:08 EST by System User on behalf of UQ Diamantina Institute