Detection of Differentially Expressed Genes via Mixture Models and Cluster Analysis

Leesa Wockner (2011). Detection of Differentially Expressed Genes via Mixture Models and Cluster Analysis PhD Thesis, School of Mathematics & Physics, The University of Queensland.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s4077516_phd_finalthesis.pdf PhD Final Thesis application/pdf 12.53MB 0
Author Leesa Wockner
Thesis Title Detection of Differentially Expressed Genes via Mixture Models and Cluster Analysis
School, Centre or Institute School of Mathematics & Physics
Institution The University of Queensland
Publication date 2011-10
Thesis type PhD Thesis
Supervisor Professor Geoff J. McLachlan
Dr Ian Wood
Total pages 176
Total colour pages 16
Total black and white pages 160
Language eng
Subjects 01 Mathematical Sciences
Abstract/Summary The advent of new technologies in fields such as genomics and proteonomics has seen an exponential increase in biological data. Until recently, statisticians have considered, in practice and theory, problems where the number of replications, n, is large relative to the `unknowns'. As such, classical methods rely on some implicit understanding that there must be, at least in theory, many observations per parameter. Fisher-Neyman-Pearson theory of hypothesis testing, for example, was mathematically elegant and widely applicable. However, control of the family wise error rate (FWER) in multiple hypothesis testing is no longer a feasible approach. In this thesis, we consider the multiple hypothesis testing problem. We suppose that there is a large number N of hypotheses to be tested simultaneously. It is considered in the context of the problem of the detection of differentially expressed genes on the basis of data arising from a series of n microarray experiments that provide measurements (gene expression data) on a very large number N of genes. Throughout this thesis we will investigate the application of the two-component normal mixture model to the distribution of transformed test statistics, the resulting FDR, and the posterior probability of differential expression. Furthermore, we will respond to the problems caused by excessive correlation in microarrays, and develop a clustering method where by the initial aim is to cluster the gene profiles into a number of tight clusters in order to share information about the genes. We use a mixture of linear models to effect this clustering from which we are able to assess the probability of no differential expression for each gene and the local false discovery rate. We demonstrate this approach on simulated and real data sets and compare our methods and results to others in the field.
Keyword false discovery rate, two group model, EM algorithm, local false discovery rate, non-null z-values, miroarray, differential gene detection, linear mixed effects models.
Additional Notes Colour pages: 30-31,67,70-73,76-78,80,84,127,133-135

Citation counts: Google Scholar Search Google Scholar
Created: Thu, 02 Feb 2012, 19:58:24 EST by Miss Leesa Wockner on behalf of Library - Information Access Service