Finite Mixture Modelling using Multivariate Skew Distributions

Leemaqz, Sharon X. (2014). Finite Mixture Modelling using Multivariate Skew Distributions PhD Thesis, School of Mathematics and Physics, The University of Queensland. doi:10.14264/uql.2014.143

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
s4225458_phd_final.pdf Thesis (fulltext) application/pdf 4.27MB 8

Author Leemaqz, Sharon X.
Thesis Title Finite Mixture Modelling using Multivariate Skew Distributions
School, Centre or Institute School of Mathematics and Physics
Institution The University of Queensland
DOI 10.14264/uql.2014.143
Publication date 2014
Thesis type PhD Thesis
Supervisor Geoffrey J. McLachlan
Ian Wood
Total pages 215
Language eng
Subjects 010401 Applied Statistics
010405 Statistical Theory
Formatted abstract
Finite mixture models, in particular normal mixture models, have received extensive attention in both theoretical and applied statistics, as witnessed by a diverse range of applications including bioinformatics, biomedicine, biometrics, genetics, finance, image analysis, and psychometrics. However, the widespread assumption of normality is often not satisfied in practice, especially in the multivariate case. Further complications emerge when these data involve clusters that are highly asymmetric in shape. Beside skewness, their distributions can display other unusual characteristics, including heavy tails, and other non-normal features. Thus, there is a strong need to develop more flexible methods that can effectively deal with these situations.

This thesis addresses those concerns by developing new finite mixture models with skew distributions that feature high flexibility in shape, providing a highly capable statistical model for robust modelling of complex non-normal data. More specifically, the original contributions of this research are centred on three main areas. Firstly, a classification scheme is proposed to categorize multivariate skew distributions into four forms based on their theoretical characterizations and properties, which greatly aids in our understanding of the intricate links and connections between them. These families of distributions are natural extensions of the normal distribution, with additional parameters to accommodate a range of non-normal features, and thus are suitable for accommodating asymmetric behavior in the data.

The second and central part of this thesis is dedicated to the development of new statistical methodology based on finite mixtures of multivariate skew distributions, in particular, the unrestricted and canonical fundamental skew t-mixture models. By adopting these flexible parametric distributions as component densities of mixture models, multivariate skew mixture models are shown to be highly effective in capturing complex features of multivariate data, including multimodality, skewness, and heavy tails. We also present a novel parameter estimation algorithm for our canonical fundamental skew t-mixture model via an exact implementation of the EM algorithm, which also turns out to be an elegant unification of existing proposals.

Lastly, but not least, a new computational tool for automated analysis of high-throughput flow cytometric data was introduced and practically implemented. The superiority and effectiveness of this methodology was demonstrated on a number of real clinical datasets, where our model-based approach provides significant improvements for the accuracy of cell population identification. To take the step further, the flexibility and usefulness of skew mixture models are further exemplified through a series of new applications to a range of different scientific datasets, and shown to have favourable performance to existing non-normal mixture models. Undoubtedly, this powerful methodology will also be widely applicable to the analysis of data arising in many other important scientific areas. 
Keyword Multivariate skew distribution
Skew t component distributions
Skew normal distributions
Mixture models
EM algorithm
Flow cytometry
Image segmentation

Citation counts: Google Scholar Search Google Scholar
Created: Fri, 13 Jun 2014, 13:40:15 EST by Miss Sharon Lee on behalf of Scholarly Communication and Digitisation Service