A Mixture model with random-effects components for clustering correlated gene-expression profiles

Ng, SK, McLachlan, GJ, Wang, K, Jones, LBT and Ng, SW (2006) A Mixture model with random-effects components for clustering correlated gene-expression profiles. Bioinformatics, 22 14: 1745-1752. doi:10.1093/bioinformatics/btl165

Author Ng, SK
McLachlan, GJ
Wang, K
Jones, LBT
Ng, SW
Title A Mixture model with random-effects components for clustering correlated gene-expression profiles
Journal name Bioinformatics   Check publisher's open access policy
ISSN 1367-4803
Publication date 2006
Sub-type Article (original research)
DOI 10.1093/bioinformatics/btl165
Volume 22
Issue 14
Start page 1745
End page 1752
Total pages 8
Editor A Bateman
A Valencia
Place of publication Oxford
Publisher Oxford Univ Press
Collection year 2006
Language eng
Subject C1
270201 Gene Expression
321011 Medical Genetics
780105 Biological sciences
730305 Diagnostic methods
230204 Applied Statistics
780102 Physical sciences
Abstract Motivation: The clustering of gene profiles across some experimental conditions of interest contributes significantly to the elucidation of unknown gene function, the validation of gene discoveries and the interpretation of biological processes. However, this clustering problem is not straightforward as the profiles of the genes are not all independently distributed and the expression levels may have been obtained from an experimental design involving replicated arrays. Ignoring the dependence between the gene profiles and the structure of the replicated data can result in important sources of variability in the experiments being overlooked in the analysis, with the consequent possibility of misleading inferences being made. We propose a random-effects model that provides a unified approach to the clustering of genes with correlated expression levels measured in a wide variety of experimental situations. Our model is an extension of the normal mixture model to account for the correlations between the gene profiles and to enable covariate information to be incorporated into the clustering process. Hence the model is applicable to longitudinal studies with or without replication, for example, time-course experiments by using time as a covariate, and to cross-sectional experiments by using categorical covariates to represent the different experimental classes. Results: We show that our random-effects model can be fitted by maximum likelihood via the EM algorithm for which the E(expectation) and M(maximization) steps can be implemented in closed form. Hence our model can be fitted deterministically without the need for time-consuming Monte Carlo approximations. The effectiveness of our model-based procedure for the clustering of correlated gene profiles is demonstrated on three real datasets, representing typical microarray experimental designs, covering time-course, repeated-measurement and cross-sectional data. In these examples, relevant clusters of the genes are obtained, which are supported by existing gene-function annotation. A synthetic dataset is considered too.
Keyword Mathematics, Interdisciplinary Applications
Biochemical Research Methods
Biotechnology & Applied Microbiology
Computer Science, Interdisciplinary Applications
Statistics & Probability
Microarray Experiments
Mathematical & Computational Biology
Q-Index Code C1

Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 72 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 78 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Wed, 15 Aug 2007, 08:19:27 EST