Modeling of inter-sample variation in flow cytometric data with the joint clustering and matching procedure

Lee, Sharon X., McLachlan, Geoffrey J. and Pyne, Saumyadipta (2016) Modeling of inter-sample variation in flow cytometric data with the joint clustering and matching procedure. Cytometry Part A, 89 1: 30-43. doi:10.1002/cyto.a.22789

Author Lee, Sharon X.
McLachlan, Geoffrey J.
Pyne, Saumyadipta
Title Modeling of inter-sample variation in flow cytometric data with the joint clustering and matching procedure
Journal name Cytometry Part A   Check publisher's open access policy
ISSN 1552-4930
Publication date 2016-01-01
Year available 2015
Sub-type Article (original research)
DOI 10.1002/cyto.a.22789
Open Access Status Not Open Access
Volume 89
Issue 1
Start page 30
End page 43
Total pages 14
Place of publication Hoboken NJ, United States
Publisher John Wiley & Sons
Language eng
Abstract We present an algorithm for modeling flow cytometry data in the presence of large inter-sample variation. Large-scale cytometry datasets often exhibit some within-class variation due to technical effects such as instrumental differences and variations in data acquisition, as well as subtle biological heterogeneity within the class of samples. Failure to account for such variations in the model may lead to inaccurate matching of populations across a batch of samples and poor performance in classification of unlabeled samples. In this paper, we describe the Joint Clustering and Matching (JCM) procedure for simultaneous segmentation and alignment of cell populations across multiple samples. Under the JCM framework, a multivariate mixture distribution is used to model the distribution of the expressions of a fixed set of markers for each cell in a sample such that the components in the mixture model may correspond to the various populations of cells, which have similar expressions of markers (that is, clusters), in the composition of the sample. For each class of samples, an overall class template is formed by the adoption of random-effects terms to model the inter-sample variation within a class. The construction of a parametric template for each class allows for direct quantification of the differences between the template and each sample, and also between each pair of samples, both within or between classes. The classification of a new unclassified sample is then undertaken by assigning the unclassified sample to the class that minimizes the distance between its fitted mixture density and each class density as provided by the class templates. For illustration, we use a symmetric form of the Kullback-Leibler divergence as a distance measure between two densities, but other distance measures can also be applied. We show and demonstrate on four real datasets how the JCM procedure can be used to carry out the tasks of automated clustering and alignment of cell populations, and supervised classification of samples.
Keyword Flow cytometry
Class template
Inter-sample variation
Skew mixture models
EM algorithm
Q-Index Code C1
Q-Index Status Provisional Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: School of Mathematics and Physics
Official 2016 Collection
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 8 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 21 Feb 2016, 10:25:56 EST by System User on behalf of School of Mathematics & Physics