Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors

MacLeod, Iona M., Larkin, Denis M., Lewin, Harris A., Hayes, Ben J. and Goddard, Mike E. (2013) Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors. Molecular Biology and Evolution, 30 9: 2209-2223. doi:10.1093/molbev/mst125


Author MacLeod, Iona M.
Larkin, Denis M.
Lewin, Harris A.
Hayes, Ben J.
Goddard, Mike E.
Title Inferring demography from runs of homozygosity in whole-genome sequence, with correction for sequence errors
Journal name Molecular Biology and Evolution   Check publisher's open access policy
ISSN 0737-4038
1537-1719
Publication date 2013-09
Sub-type Article (original research)
DOI 10.1093/molbev/mst125
Open Access Status Not yet assessed
Volume 30
Issue 9
Start page 2209
End page 2223
Total pages 15
Place of publication Cary, NC, United States
Publisher Oxford University Press
Language eng
Formatted abstract
Whole-genome sequence is potentially the richest source of genetic data for inferring ancestral demography. However, full sequence also presents significant challenges to fully utilize such large data sets and to ensure that sequencing errors do not introduce bias into the inferred demography. Using whole-genome sequence data from two Holstein cattle, we demonstrate a new method to correct for bias caused by hidden errors and then infer stepwise changes in ancestral demography up to present. There was a strong upward bias in estimates of recent effective population size (Ne) if the correction method was not applied to the data, both for our method and the Li and Durbin (Inference of human population history from individual whole-genome sequences. Nature 475:493-496) pairwise sequentially Markovian coalescent method. To infer demography, we use an analytical predictor of multiloci linkage disequilibrium (LD) based on a simple coalescent model that allows for changes in Ne. The LD statistic summarizes the distribution of runs of homozygosity for any given demography. We infer a best fit demography as one that predicts a match with the observed distribution of runs of homozygosity in the corrected sequence data. We use multiloci LD because it potentially holds more information about ancestral demography than pairwise LD. The inferred demography indicates a strong reduction in the Ne around 170,000 years ago, possibly related to the divergence of African and European Bos taurus cattle. This is followed by a further reduction coinciding with the period of cattle domestication, with Ne of between 3,500 and 6,000. The most recent reduction of Ne to approximately 100 in the Holstein breed agrees well with estimates from pedigrees. Our approach can be applied to whole-genome sequence from any diploid species and can be scaled up to use sequence from multiple individuals.
Keyword Effective population size
Haplotype homozygosity
Linkage disequilibrium
Next generation sequencing
PSMC
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: Queensland Alliance for Agriculture and Food Innovation
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 23 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 24 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Fri, 05 Aug 2016, 09:59:19 EST by System User on behalf of Learning and Research Services (UQ Library)