Application of gene shaving and mixture models to cluster microarray gene expression data

Do, K. A., McLachlan, G. J., Bean, R. W. and Wen, S. (2007) Application of gene shaving and mixture models to cluster microarray gene expression data. Cancer Informatics, 5 25-43.

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads
UQ134938_OA.pdf Full text (open access) application/pdf 12.75MB 3
Author Do, K. A.
McLachlan, G. J.
Bean, R. W.
Wen, S.
Title Application of gene shaving and mixture models to cluster microarray gene expression data
Journal name Cancer Informatics   Check publisher's open access policy
ISSN 1176-9351
Publication date 2007
Sub-type Article (original research)
Open Access Status File (Publisher version)
Volume 5
Start page 25
End page 43
Total pages 19
Place of publication Auckland, New Zealand
Publisher Libertas Academica
Collection year 2008
Language eng
Subject 230204 Applied Statistics
780101 Mathematical sciences
010401 Applied Statistics
060405 Gene Expression (incl. Microarray and other genome-wide approaches)
Abstract Researchers are frequently faced with the analysis of microarray data of a relatively large number of genes using a small number of tissue samples. We examine the application of two statistical methods for clustering such microarray expression data: EMMIX-GENE and GeneClust. EMMIX-GENE is a mixture-model based clustering approach, designed primarily to cluster tissue samples on the basis of the genes. GeneClust is an implementation of the gene shaving methodology, motivated by research to identify distinct sets of genes for which variation in expression could be related to a biological property of the tissue samples. We illustrate the use of these two methods in the analysis of Affymetrix oligonucleotide arrays of well-known data sets from colon tissue samples with and without tumors, and of tumor tissue samples from patients with leukemia. Although the two approaches have been developed from different perspectives, the results demonstrate a clear correspondence between gene clusters produced by GeneClust and EMMIX-GENE for the colon tissue data. It is demonstrated, for the case of ribosomal proteins and smooth muscle genes in the colon data set, that both methods can classify genes into co-regulated families. It is further demonstrated that tissue types (tumor and normal) can be separated on the basis of subtle distributed patterns of genes. Application to the leukemia tissue data produces a division of tissues corresponding closely to the external classification, acute myeloid meukemia (AML) and acute lymphoblastic leukemia (ALL), for both methods. In addition, we also identify genes specific for the subgroup of ALL-Tcell samples. Overall, we find that the gene shaving method produces gene clusters at great speed; allows variable cluster sizes and can incorporate partial or full supervision; and finds clusters of genes in which the gene expression varies greatly over the tissue samples while maintaining a high level of coherence between the gene expression profi les. The intent of the EMMIX-GENE method is to cluster the tissue samples. It performs a filtering step that results in a subset of relevant genes, followed by gene clustering, and then tissue clustering, and is favorable in its accuracy of ranking the clusters produced.
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Version Filter Type
Citation counts: Scopus Citation Count Cited 0 times in Scopus Article
Google Scholar Search Google Scholar
Created: Wed, 09 Apr 2008, 14:10:46 EST by Marie Grove on behalf of School of Mathematics & Physics