Estimation of allele frequency and association mapping using next-generation sequencing data

Kim, Su Yeon, Lohmueller, Kirk E., Albrechtsen, Anders, Li, Yingrui, Korneliussen, Thorfinn, Tian, Geng, Grarup, Niels, Jiang, Tao, Andersen, Gitte, Witte, Daniel, Jorgensen, Torben, Hansen, Torben, Pedersen, Oluf, Wang, Jun and Nielsen, Rasmus (2011) Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinformatics, 12 . doi:10.1186/1471-2105-12-231

Author Kim, Su Yeon
Lohmueller, Kirk E.
Albrechtsen, Anders
Li, Yingrui
Korneliussen, Thorfinn
Tian, Geng
Grarup, Niels
Jiang, Tao
Andersen, Gitte
Witte, Daniel
Jorgensen, Torben
Hansen, Torben
Pedersen, Oluf
Wang, Jun
Nielsen, Rasmus
Title Estimation of allele frequency and association mapping using next-generation sequencing data
Journal name BMC Bioinformatics   Check publisher's open access policy
ISSN 1471-2105
Publication date 2011-06-11
Year available 2011
Sub-type Article (original research)
DOI 10.1186/1471-2105-12-231
Open Access Status DOI
Volume 12
Total pages 16
Place of publication London, United Kingdom
Publisher BioMed Central
Language eng
Formatted abstract
Background: Estimation of allele frequency is of fundamental importance in population genetic analyses and in association mapping. In most studies using next-generation sequencing, a cost effective approach is to use medium or low-coverage data (e.g., < 15X). However, SNP calling and allele frequency estimation in such studies is associated with substantial statistical uncertainty because of varying coverage and high error rates.

Results: We evaluate a new maximum likelihood method for estimating allele frequencies in low and medium coverage next-generation sequencing data. The method is based on integrating over uncertainty in the data for each individual rather than first calling genotypes. This method can be applied to directly test for associations in case/control studies. We use simulations to compare the likelihood method to methods based on genotype calling, and show that the likelihood method outperforms the genotype calling methods in terms of: (1) accuracy of allele frequency estimation, (2) accuracy of the estimation of the distribution of allele frequencies across neutrally evolving sites, and (3) statistical power in association mapping studies. Using real re-sequencing data from 200 individuals obtained from an exon-capture experiment, we show that the patterns observed in the simulations are also found in real data.

Conclusions: Overall, our results suggest that association mapping and estimation of allele frequencies should not be based on genotype calling in low to medium coverage data. Furthermore, if genotype calling methods are used, it is usually better not to filter genotypes based on the call confidence score.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: School of Chemistry and Molecular Biosciences
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 19 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 55 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Thu, 03 Sep 2015, 15:31:58 EST by Mr Mathew Carter on behalf of Scholarly Communication and Digitisation Service