Finite mixture models, in particular normal mixture models, have received extensive attention in both theoretical and applied statistics, as witnessed by a diverse range of applications including bioinformatics, biomedicine, biometrics, genetics, finance, image analysis, and psychometrics. However, the widespread assumption of normality is often not satisfied in practice, especially in the multivariate case. Further complications emerge when these data involve clusters that are highly asymmetric in shape. Beside skewness, their distributions can display other unusual characteristics, including heavy tails, and other non-normal features. Thus, there is a strong need to develop more flexible methods that can effectively deal with these situations.
This thesis addresses those concerns by developing new finite mixture models with skew distributions that feature high flexibility in shape, providing a highly capable statistical model for robust modelling of complex non-normal data. More specifically, the original contributions of this research are centred on three main areas. Firstly, a classification scheme is proposed to categorize multivariate skew distributions into four forms based on their theoretical characterizations and properties, which greatly aids in our understanding of the intricate links and connections between them. These families of distributions are natural extensions of the normal distribution, with additional parameters to accommodate a range of non-normal features, and thus are suitable for accommodating asymmetric behavior in the data.
The second and central part of this thesis is dedicated to the development of new statistical methodology based on finite mixtures of multivariate skew distributions, in particular, the unrestricted and canonical fundamental skew t-mixture models. By adopting these flexible parametric distributions as component densities of mixture models, multivariate skew mixture models are shown to be highly effective in capturing complex features of multivariate data, including multimodality, skewness, and heavy tails. We also present a novel parameter estimation algorithm for our canonical fundamental skew t-mixture model via an exact implementation of the EM algorithm, which also turns out to be an elegant unification of existing proposals.
Lastly, but not least, a new computational tool for automated analysis of high-throughput flow cytometric data was introduced and practically implemented. The superiority and effectiveness of this methodology was demonstrated on a number of real clinical datasets, where our model-based approach provides significant improvements for the accuracy of cell population identification. To take the step further, the flexibility and usefulness of skew mixture models are further exemplified through a series of new applications to a range of different scientific datasets, and shown to have favourable performance to existing non-normal mixture models. Undoubtedly, this powerful methodology will also be widely applicable to the analysis of data arising in many other important scientific areas.