ID: 1505.03339

Bayesian cluster analysis: Point estimation and credible balls

May 13, 2015

View on ArXiv
Sara Wade, Zoubin Ghahramani
Statistics
Methodology

Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. As opposed to classical algorithms which return a single clustering solution, Bayesian nonparametric models provide a posterior over the entire space of partitions, allowing one to assess statistical properties, such as uncertainty on the number of clusters. However, an important problem is how to summarize the posterior; the huge dimension of partition space and difficulties in visualizing it add to this problem. In a Bayesian analysis, the posterior of a real-valued parameter of interest is often summarized by reporting a point estimate such as the posterior mean along with 95% credible intervals to characterize uncertainty. In this paper, we extend these ideas to develop appropriate point estimates and credible sets to summarize the posterior of clustering structure based on decision and information theoretic techniques.

Similar papers 1

Discussion of the article "Bayesian cluster analysis: point estimation and credible balls" by Wade and Ghahramani

April 6, 2018

93% Match
Nial Friel, Riccardo Rastelli
Methodology

We present a discussion of the paper "Bayesian cluster analysis: point estimation and credible balls" by Wade and Ghahramani. We believe that this paper contributes substantially to the literature on Bayesian clustering by filling in an important methodological gap, by providing a means to assess the uncertainty around a point estimate of the optimal clustering solution based on a given loss function. In our discussion we reflect on the characterisation of uncertainty around ...

Find SimilarView on arXiv

Discussion on Bayesian Cluster Analysis: Point Estimation and Credible Balls by Sara Wade and Zoubin Ghahramani

March 13, 2018

93% Match
William Weimin Yoo
Methodology

I begin my discussion by giving an overview of the main results. Then I proceed to touch upon issues about whether the credible ball constructed can be interpreted as a confidence ball, suggestions on reducing computational costs, and posterior consistency or contraction rates.

Find SimilarView on arXiv

A review on Bayesian model-based clustering

March 30, 2023

92% Match
Clara Grazian
Methodology

Clustering is an important task in many areas of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others. Methodologies to perform inference on the number of clusters have often been proved to be inconsistent, and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering is performed by considering the unknown partition as a random o...

Find SimilarView on arXiv

Revisiting k-means: New Algorithms via Bayesian Nonparametrics

November 2, 2011

89% Match
Brian Kulis, Michael I. Jordan
Machine Learning
Machine Learning

Bayesian models offer great flexibility for clustering applications---Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. Inspired by the asymptotic connection between k-m...

Find SimilarView on arXiv

Bayesian Cluster Enumeration Criterion for Unsupervised Learning

October 22, 2017

89% Match
Freweyni K. Teklehaymanot, Michael Muma, Abdelhak M. Zoubir
Statistics Theory
Machine Learning
Machine Learning
Statistics Theory

We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for m...

Find SimilarView on arXiv

Entropy regularization in probabilistic clustering

July 19, 2023

89% Match
Beatrice Franzolini, Giovanni Rebaudo
Methodology
Computation
Machine Learning

Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalize...

Find SimilarView on arXiv

Optimal Bayesian estimators for latent variable cluster models

July 8, 2016

89% Match
Riccardo Rastelli, Nial Friel
Methodology

In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical n...

Find SimilarView on arXiv

A generalized Bayes framework for probabilistic clustering

June 9, 2020

88% Match
Tommaso Rigon, Amy H. Herring, David B. Dunson
Methodology
Machine Learning

Loss-based clustering methods, such as k-means and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative, but such methods face computational problems and large sensitivity to the choice of kernel. This article proposes a generalized Bayes framework that bridges between these two paradigms through the use...

Find SimilarView on arXiv

Information theoretic model validation for clustering

June 2, 2010

88% Match
Joachim M. Buhmann
Information Theory
Machine Learning
Information Theory
Machine Learning

Model selection in clustering requires (i) to specify a suitable clustering principle and (ii) to control the model order complexity by choosing an appropriate number of clusters depending on the noise level in the data. We advocate an information theoretic perspective where the uncertainty in the measurements quantizes the set of data partitionings and, thereby, induces uncertainty in the solution space of clusterings. A clustering model, which can tolerate a higher level of...

Find SimilarView on arXiv

A Random Finite Set Model for Data Clustering

March 14, 2017

88% Match
Dinh Phung, Ba-Ngu Bo
Machine Learning

The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses se...

Find SimilarView on arXiv