Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect

Van Den Berg, Irene , Bowman, Phil J, MacLeod, Iona M., Hayes, Ben J., Wang, Tingting, Bolormaa, Sunduimijid and Goddard, Mike E. (2017) Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect. Genetics Selection Evolution, 49 1: 70. doi:10.1186/s12711-017-0347-9


Author Van Den Berg, Irene
Bowman, Phil J
MacLeod, Iona M.
Hayes, Ben J.
Wang, Tingting
Bolormaa, Sunduimijid
Goddard, Mike E.
Title Multi-breed genomic prediction using Bayes R with sequence data and dropping variants with a small effect
Journal name Genetics Selection Evolution   Check publisher's open access policy
ISSN 1297-9686
0999-193X
Publication date 2017-09-21
Year available 2017
Sub-type Article (original research)
DOI 10.1186/s12711-017-0347-9
Open Access Status DOI
Volume 49
Issue 1
Start page 70
Total pages 15
Place of publication London, United Kingdom
Publisher BioMed Central
Language eng
Subject 1105 Ecology, Evolution, Behavior and Systematics
1103 Animal Science and Zoology
1311 Genetics
Abstract Background: The increasing availability of whole-genome sequence data is expected to increase the accuracy of genomic prediction. However, results from simulation studies and analysis of real data do not always show an increase in accuracy from sequence data compared to high-density (HD) single nucleotide polymorphism (SNP) chip genotypes. In addition, the sheer number of variants makes analysis of all variants and accurate estimation of all effects computationally challenging. Our objective was to find a strategy to approximate the analysis of whole-sequence data with a Bayesian variable selection model. Using a simulated dataset, we applied a Bayes R hybrid model to analyse whole-sequence data, test the effect of dropping a proportion of variants during the analysis, and test how the analysis can be split into separate analyses per chromosome to reduce the elapsed computing time. We also investigated the effect of imputation errors on prediction accuracy. Subsequently, we applied the approach to a dataset that contained imputed sequences and records for production and fertility traits for 38,492 Holstein, Jersey, Australian Red and crossbred bulls and cows. Results: With the simulated dataset, we found that prediction accuracy was highly increased for a breed that was not represented in the training population for sequence data compared to HD SNP data. Either dropping part of the variants during the analysis or splitting the analysis into separate analyses per chromosome decreased accuracy compared to analysing whole-sequence data. First, dropping variants from each chromosome and reanalysing the retained variants together resulted in an accuracy similar to that obtained when analysing whole-sequence data. Adding imputation errors decreased prediction accuracy, especially for errors in the validation population. With real data, using sequence variants resulted in accuracies that were similar to those obtained with the HD SNPs. Conclusions: We present an efficient approach to approximate analysis of whole-sequence data with a Bayesian variable selection model. The lack of increase in prediction accuracy when applied to real data could be due to imputation errors, which demonstrates the importance of developing more accurate methods of imputation or directly genotyping sequence variants that have a major effect in the prediction equation.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: HERDC Pre-Audit
Queensland Alliance for Agriculture and Food Innovation
 
Versions
Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Tue, 03 Oct 2017, 00:05:20 EST by Web Cron on behalf of School of Political Science & Internat'l Studies