Weighted Gibbs sampling for mixture modelling of massive datasets via coresets

McGrory, Clare Anne, Ahfock, Daniel C., Horsley, Joshua A. and Alston, Clair L. (2014) Weighted Gibbs sampling for mixture modelling of massive datasets via coresets. Stat, 3 1: 291-299. doi:10.1002/sta4.62

Attached Files (Some files may be inaccessible until you login with your UQ eSpace credentials)
Name Description MIMEType Size Downloads

Author McGrory, Clare Anne
Ahfock, Daniel C.
Horsley, Joshua A.
Alston, Clair L.
Title Weighted Gibbs sampling for mixture modelling of massive datasets via coresets
Journal name Stat   Check publisher's open access policy
ISSN 2049-1573
Publication date 2014
Year available 2014
Sub-type Article (original research)
DOI 10.1002/sta4.62
Open Access Status
Volume 3
Issue 1
Start page 291
End page 299
Total pages 9
Place of publication Oxford, United Kingdom
Publisher John Wiley & Sons
Collection year 2015
Language eng
Abstract Massive datasets are increasingly encountered in modern research applications, and this presents tremendous new challenges for statisticians. In settings where the aim is to classify or cluster data via finite mixture modelling, such as in satellite image analysis, the large number data points to be analysed can make fitting such models either infeasible, or simply too time-consuming to be of practical use. It has been shown that using a representative weighted subsample of the complete dataset to estimate mixture model parameters can lead to much more time-efficient and yet still reasonable inference. These representative subsamples are called coresets. Naturally, these coresets have to be constructed carefully as the naive approach of performing simple uniform sampling from the dataset could lead to smaller clusters of points within the dataset being severely undersampled, and this would in turn result in very unreliable inference. It has previously been shown that an adaptive sampling approach can be used to obtain a representative coreset of data points together with a corresponding set of coreset weights. In this article, we explore how this idea can be incorporated into a Gibbs sampling algorithm for mixture modelling of image data via coresets within a Bayesian framework. We call the resulting algorithm a Weighted Gibbs Sampler. We will illustrate this proposed approach through an application to remote sensing of land use from satellite imagery.
Keyword Bayesian statistics
Mixture modelling
Satellite image analysis
Weighted Gibbs sampler
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: School of Mathematics and Physics
Official 2015 Collection
Version Filter Type
Citation counts: Scopus Citation Count Cited 1 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Fri, 26 Sep 2014, 14:09:55 EST by Ms Dulcie Stewart on behalf of School of Mathematics & Physics