Characterization of the genetic and environmental factors driving gene expression variability

Goldinger, Anita (2017). Characterization of the genetic and environmental factors driving gene expression variability PhD Thesis, Faculty of Medicine, The University of Queensland. doi:10.14264/uql.2017.463

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
AG_final_thesis_R1_submission.pdf Thesis (open access) application/pdf 13.73MB 0
Author Goldinger, Anita
Thesis Title Characterization of the genetic and environmental factors driving gene expression variability
School, Centre or Institute Faculty of Medicine
Institution The University of Queensland
DOI 10.14264/uql.2017.463
Publication date 2017-03-27
Thesis type PhD Thesis
Supervisor Joseph E. Powell
Peter M. Visscher
Total pages 217
Total colour pages 15
Total black and white pages 202
Language eng
Subjects 010402 Biostatistics
060405 Gene Expression (incl. Microarray and other genome-wide approaches)
060412 Quantitative Genetics (incl. Disease and Trait Mapping Genetics)
Formatted abstract
Gene expression variation is a quantitative trait that drives phenotypic diversity across populations. On a cellular level, gene expression is an intermediate phenotype between stored genetic information and the functional utilization of this information within the cell. Through Genome Wide Association Studies(GWAS), thousands of genetic polymorphisms associated with numerous diseases have been identified. These have provided many novel insights into the disrupted biological processes that drive the etiology of various health conditions. Expression Quantitative Trait Loci (eQTL) provide an additional layer of biological information about the physiological impact of common genetic variants. Therefore, the study of the genetic regulation of gene expression (eQTL studies) has been useful both in the validation and functional characterisation of GWAS polymorphisms. This has contributed to a better understanding of the precise molecular processes that contribute to the development of disease.

Global transcriptomic analyses have provided as greater insight into the level of complexity that drives biological systems. Transcriptomic data are often comprised of gene regulatory and co-expression networks, an emergent property of transcriptomic and other –omic data. These networks within each –omics fields interact with each other to further add layers of complexity that drive biological systems. Variation contained with gene expression datasets can, therefore, provide detail into the flow of information through these biological systems and how these can be influenced by genetic polymorphisms.

Transcriptomic variation is highly influenced by genetic and environmental factors. Genetic regulation of gene expression represents, with some exceptions, fixed regulatory points that strictly control the expression of genes. Variance attributed to environmental effects, on the other hand, are often biological responses to specific stimuli. The dissection of the genetic and environmental influences on expression levels will help to form a baseline upon which network models can be built to disseminate the biological flow of information in healthy, latent or disease groups.

This thesis will detail both methodological methods to clean data, and statistical approaches analyze the complexities found within the variance of transcriptomic data. The focus of this thesis is the dissection of three major influences of gene expression variability: technical artifacts, environmental and genetic variation. Using statistical and quantitative genetic techniques on array-based genotype and gene expression datasets, this thesis examines:

1. The use of Principal Components Analysis (PCA) to identify and correct for known batch effects
2. Season variation as a pervasive environmental contributor to gene expression variation
3. The genetic contribution driving robust gene co-expression modules

The Brisbane Systems Genetics Study (BSGS) is comprised of both unrelated and related individuals and was used throughout these three studies. The Center for Healthy Discovery and Well Being {CHDWB} cohort was used as a replication study and the Multiple Tissue Human Expression Resource (MuTHER) cohort was employed to examine tissue-specific effects.

The first chapter provides a technical methodological analysis of the batch effect correction technique PCA. Batch effects have a large impact on gene expression variability, often creating artificial systematic trends. By decomposing the data in Principal Components (PCs) we were able to quantify the degree and distribution of technical artifacts within gene expression datasets and determine the effectiveness of this correction method.

The second chapter examines the influence of pervasive macro-environmental factors on gene expression datasets and provides a statistical framework to identify seasonal variation. Since datasets are often collected over time, samples may contain seasonal trends in gene expression that are environmentally driven and are not regarded as technical artifacts. By using loess decomposition and cosinor regression, 74 transcripts with a significant season trend were identified independently of seasonal variation in blood cell count.

Chapter three examines the genetic contribution to gene expression covariance between transcripts, called Blood Informative Transcripts (BITs) comprising of nine modules that have been previously identified and validated. Using quantitative genetic techniques, the genetic and environmental components driving phenotypic correlations for BIT transcripts were quantified. When compared to 10,000 bootstrap permutations of random probes the BITS demonstrate significant genetic correlation (average 0.63 across all BITs) and an average genetic contribution to phenotypic correlations of 0.42. The high degree of genetic correlation demonstrates a strong genetic framework regulating the expression of BITs modules. This chapter also examines the presence of this replicated modules in three separate tissue types, identifying several tissue-specific coexpression modules.

Overall, this thesis explores the landscape of gene expression variability. A methodological framework for the identification of technical artifacts and seasonal effects is investigated and the genetic architecture driving transcriptomic co-expression is characterized.
Keyword Genetics
Statistics
Gene expression
Seasonal variation
Genetic covariance
Genetic variation
Quantitative trait loci
Additional Notes Pages 61-62, 64, 67, 69, 84, 86-87,89- 90, 92, 113, 115, 122, 123 have colour images on them and should be printed in color.

Document type: Thesis
Collections: UQ Theses (RHD) - Official
UQ Theses (RHD) - Open Access
 
Versions
Version Filter Type
Citation counts: Google Scholar Search Google Scholar
Created: Sun, 19 Mar 2017, 01:31:03 EST by Anita Goldinger on behalf of Learning and Research Services (UQ Library)