Fine-scale estimation of location of birth from genome-wide single-nucleotide polymorphism data

Hoggart, Clive J., O'Reilly, Paul F., Kaakinen, Marika, Zhang, Weihua, Chambers, John C., Kooner, Jaspal S., Coin, Lachlan J. M. and Jarvelin, Marjo-Riitta (2012) Fine-scale estimation of location of birth from genome-wide single-nucleotide polymorphism data. Genetics, 190 2: 669-677. doi:10.1534/genetics.111.135657

Author Hoggart, Clive J.
O'Reilly, Paul F.
Kaakinen, Marika
Zhang, Weihua
Chambers, John C.
Kooner, Jaspal S.
Coin, Lachlan J. M.
Jarvelin, Marjo-Riitta
Title Fine-scale estimation of location of birth from genome-wide single-nucleotide polymorphism data
Journal name Genetics   Check publisher's open access policy
ISSN 0016-6731
Publication date 2012
Sub-type Article (original research)
DOI 10.1534/genetics.111.135657
Open Access Status
Volume 190
Issue 2
Start page 669
End page 677
Total pages 9
Place of publication Bethesda, MD, United States
Publisher Genetics Society of America
Language eng
Subject 1311 Genetics
Abstract Systematic nonrandom mating in populations results in genetic stratification and is predominantly caused by geographic separation, providing the opportunity to infer individuals' birthplace from genetic data. Such inference has been demonstrated for individuals' country of birth, but here we use data from the Northern Finland Birth Cohort 1966 (NFBC1966) to investigate the characteristics of genetic structure within a population and subsequently develop a method for inferring location to a finer scale. Principal component analysis (PCA) shows that while the first PCs are particularly informative for location, there is also location information in the higher-order PCs, but it cannot be captured by a linear model. We introduce a new method, pcLOCATE, which is able to exploit this information to improve the accuracy of location inference. pcLOCATE uses individuals' PC values to estimate the probability of birth in each town and then averages over all towns to give an estimated longitude and latitude of birth using a fully Bayesian model. We apply pcLOCATE to the NFBC1966 data to estimate parental birthplace, testing with successively more PCs and finding the model with the top 23 PCs most accurate, with a median distance of 23 km between the estimated and the true location. pcLOCATE predicts the most recent residence of NFBC1966 individuals to a median distance of 47 km. We also apply pcLOCATE to Indian individuals from the London Life Sciences Prospective Population Study (LOLIPOP) data, and find that birthplace is predicated to a median distance of 54 km from the true location. A method with such accuracy is potentially valuable in population genetics and forensics.
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status Non-UQ

Document type: Journal Article
Sub-type: Article (original research)
Collection: Institute for Molecular Bioscience - Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 4 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Fri, 24 Jan 2014, 18:57:28 EST by System User on behalf of Institute for Molecular Bioscience