A mixture model-based approach to the clustering of microarray expression data

McLachlan, GJ, Bean, RW and Peel, D (2002) A mixture model-based approach to the clustering of microarray expression data. Bioinformatics, 18 3: 413-422. doi:10.1093/bioinformatics/18.3.413

Author McLachlan, GJ
Bean, RW
Peel, D
Title A mixture model-based approach to the clustering of microarray expression data
Journal name Bioinformatics   Check publisher's open access policy
ISSN 1367-4803
Publication date 2002-01-01
Year available 2002
Sub-type Article (original research)
DOI 10.1093/bioinformatics/18.3.413
Open Access Status Not yet assessed
Volume 18
Issue 3
Start page 413
End page 422
Total pages 10
Editor C. Sander
Place of publication United Kingdom
Publisher Oxford University Press
Language eng
Subject C1
230204 Applied Statistics
780101 Mathematical sciences
Abstract Motivation: This paper introduces the software EMMIX-GENE that has been developed for the specific purpose of a model-based approach to the clustering of microarray expression data, in particular, of tissue samples on a very large number of genes. The latter is a nonstandard problem in parametric cluster analysis because the dimension of the feature space (the number of genes) is typically much greater than the number of tissues. A feasible approach is provided by first selecting a subset of the genes relevant for the clustering of the tissue samples by fitting mixtures of t distributions to rank the genes in order of increasing size of the likelihood ratio statistic for the test of one versus two components in the mixture model. The imposition of a threshold on the likelihood ratio statistic used in conjunction with a threshold on the size of a cluster allows the selection of a relevant set of genes. However, even this reduced set of genes will usually be too large for a normal mixture model to be fitted directly to the tissues, and so the use of mixtures of factor analyzers is exploited to reduce effectively the dimension of the feature space of genes. Results: The usefulness of the EMMIX-GENE approach for the clustering of tissue samples is demonstrated on two well-known data sets on colon and leukaemia tissues. For both data sets, relevant subsets of the genes are able to be selected that reveal interesting clusterings of the tissues that are either consistent with the external classification of the tissues or with background and biological knowledge of these sets.
Keyword Mathematics, Interdisciplinary Applications
Biochemical Research Methods
Biotechnology & Applied Microbiology
Computer Science, Interdisciplinary Applications
Statistics & Probability
Em Algorithm
Mathematical & Computational Biology
Q-Index Code C1
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: Excellence in Research Australia (ERA) - Collection
School of Physical Sciences Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 320 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 357 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 15 Aug 2007, 03:04:18 EST