Model-based clustering in gene expression microarrays: An application to breast cancer data

Mar, J. C. and McLachlan, G. J. (2003) Model-based clustering in gene expression microarrays: An application to breast cancer data. International Journal of Software Engineering And Knowledge Engineering, 13 6: 579-592. doi:10.1142/S0218194003001482


Author Mar, J. C.
McLachlan, G. J.
Title Model-based clustering in gene expression microarrays: An application to breast cancer data
Journal name International Journal of Software Engineering And Knowledge Engineering   Check publisher's open access policy
ISSN 0218-1940
Publication date 2003-12-01
Sub-type Article (original research)
DOI 10.1142/S0218194003001482
Volume 13
Issue 6
Start page 579
End page 592
Total pages 14
Editor S.K. Chang
Place of publication Singapore
Publisher World Scientific Publishing Company
Language eng
Subject C1
230204 Applied Statistics
780101 Mathematical sciences
010401 Applied Statistics
Abstract In microarray studies, the application of clustering techniques is often used to derive meaningful insights into the data. In the past, hierarchical methods have been the primary clustering tool employed to perform this task. The hierarchical algorithms have been mainly applied heuristically to these cluster analysis problems. Further, a major limitation of these methods is their inability to determine the number of clusters. Thus there is a need for a model-based approach to these. clustering problems. To this end, McLachlan et al. [7] developed a mixture model-based algorithm (EMMIX-GENE) for the clustering of tissue samples. To further investigate the EMMIX-GENE procedure as a model-based -approach, we present a case study involving the application of EMMIX-GENE to the breast cancer data as studied recently in van 't Veer et al. [10]. Our analysis considers the problem of clustering the tissue samples on the basis of the genes which is a non-standard problem because the number of genes greatly exceed the number of tissue samples. We demonstrate how EMMIX-GENE can be useful in reducing the initial set of genes down to a more computationally manageable size. The results from this analysis also emphasise the difficulty associated with the task of separating two tissue groups on the basis of a particular subset of genes. These results also shed light on why supervised methods have such a high misallocation error rate for the breast cancer data.
Keyword Computer Science, Artificial Intelligence
Computer Science, Software Engineering
Engineering, Electrical & Electronic
Microarray
Mixture Modelling
Cluster Analysis
Q-Index Code C1

 
Versions
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 5 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 15 Aug 2007, 05:24:48 EST