Genetic and nongenetic variation revealed for the principal components of human gene expression

Goldinger, Anita, Henders, Anjali K., McRae, Allan F., Martin, Nicholas G., Gibson, Greg, Montgomery, Grant W., Visscher, Peter M. and Powell, Joseph E. (2013) Genetic and nongenetic variation revealed for the principal components of human gene expression. Genetics, 195 3: 1117-1128. doi:10.1534/genetics.113.153221


Author Goldinger, Anita
Henders, Anjali K.
McRae, Allan F.
Martin, Nicholas G.
Gibson, Greg
Montgomery, Grant W.
Visscher, Peter M.
Powell, Joseph E.
Title Genetic and nongenetic variation revealed for the principal components of human gene expression
Journal name Genetics   Check publisher's open access policy
ISSN 0016-6731
Publication date 2013-01-01
Year available 2013
Sub-type Article (original research)
DOI 10.1534/genetics.113.153221
Open Access Status DOI
Volume 195
Issue 3
Start page 1117
End page 1128
Total pages 12
Place of publication Bethesda, MD, United States
Publisher Genetics Society of America
Language eng
Subject 1311 Genetics
Abstract Principal components analysis has been employed in gene expression studies to correct for population substructure and batch and environmental effects. This method typically involves the removal of variation contained in as many as 50 principal components (PCs), which can constitute a large proportion of total variation present in the data. Each PC, however, can detect many sources of variation, including gene expression networks and genetic variation influencing transcript levels. We demonstrate that PCs generated from gene expression data can simultaneously contain both genetic and nongenetic factors. From heritability estimates we show that all PCs contain a considerable portion of genetic variation while nongenetic artifacts such as batch effects were associated to varying degrees with the first 60 PCs. These PCs demonstrate an enrichment of biological pathways, including core immune function and metabolic pathways. The use of PC correction in two independent data sets resulted in a reduction in the number of cis- and trans-expression QTL detected. Comparisons of PC and linear model correction revealed that PC correction was not as efficient at removing known batch effects and had a higher penalty on genetic variation. Therefore, this study highlights the danger of eliminating biologically relevant data when employing PC correction in gene expression data.
Keyword Genetics & Heredity
Genetics & Heredity
GENETICS & HEREDITY
Q-Index Code C1
Q-Index Status Confirmed Code
Grant ID 389892
GM057091
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Queensland Brain Institute Publications
Official 2014 Collection
UQ Diamantina Institute Publications
 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 11 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 11 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Fri, 29 Nov 2013, 07:23:43 EST by System User on behalf of UQ Diamantina Institute