ID: 2303.17182

A review on Bayesian model-based clustering

March 30, 2023

View on ArXiv
Clara Grazian
Statistics
Methodology

Clustering is an important task in many areas of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others. Methodologies to perform inference on the number of clusters have often been proved to be inconsistent, and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering is performed by considering the unknown partition as a random object and define a prior distribution on it. This prior distribution may be induced by models on the observations, or directly defined for the partition. Several recent results, however, have shown the difficulties in consistently estimating the number of clusters, and, therefore, the partition. The problem itself of summarising the posterior distribution on the partition remains open, given the large dimension of the partition space. This work aims at reviewing the Bayesian approaches available in the literature to perform clustering, presenting advantages and disadvantages of each of them in order to suggest future lines of research.

Similar papers 1

Bayesian cluster analysis: Point estimation and credible balls

May 13, 2015

92% Match
Sara Wade, Zoubin Ghahramani
Methodology

Clustering is widely studied in statistics and machine learning, with applications in a variety of fields. As opposed to classical algorithms which return a single clustering solution, Bayesian nonparametric models provide a posterior over the entire space of partitions, allowing one to assess statistical properties, such as uncertainty on the number of clusters. However, an important problem is how to summarize the posterior; the huge dimension of partition space and difficu...

Find SimilarView on arXiv

Revisiting k-means: New Algorithms via Bayesian Nonparametrics

November 2, 2011

91% Match
Brian Kulis, Michael I. Jordan
Machine Learning
Machine Learning

Bayesian models offer great flexibility for clustering applications---Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. Inspired by the asymptotic connection between k-m...

Find SimilarView on arXiv

Model-based Clustering

July 5, 2018

90% Match
Bettina Grün
Methodology

Mixture models extend the toolbox of clustering methods available to the data analyst. They allow for an explicit definition of the cluster shapes and structure within a probabilistic framework and exploit estimation and inference techniques available for statistical models in general. In this chapter an introduction to cluster analysis is provided, model-based clustering is related to standard heuristic clustering methods and an overview on different ways to specify the clus...

Find SimilarView on arXiv

Bayesian approach to clustering real value, categorical and network data: solution via variational methods

May 17, 2008

90% Match
Alexei Institute for Advanced Study Vazquez
Data Analysis, Statistics an...

Data clustering, including problems such as finding network communities, can be put into a systematic framework by means of a Bayesian approach. The application of Bayesian approaches to real problems can be, however, quite challenging. In most cases the solution is explored via Monte Carlo sampling or variational methods. Here we work further on the application of variational methods to clustering problems. We introduce generative models based on a hidden group structure and...

Find SimilarView on arXiv

Model-Based Hierarchical Clustering

January 16, 2013

90% Match
Shivakumar Vaithyanathan, Byron E Dom
Machine Learning
Artificial Intelligence
Machine Learning

We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distributio...

Find SimilarView on arXiv

Dirichlet Process Parsimonious Mixtures for clustering

January 14, 2015

90% Match
Faicel Chamroukhi, Marius Bartcus, Hervé Glotin
Machine Learning
Machine Learning
Methodology

The parsimonious Gaussian mixture models, which exploit an eigenvalue decomposition of the group covariance matrices of the Gaussian mixture, have shown their success in particular in cluster analysis. Their estimation is in general performed by maximum likelihood estimation and has also been considered from a parametric Bayesian prospective. We propose new Dirichlet Process Parsimonious mixtures (DPPM) which represent a Bayesian nonparametric formulation of these parsimoniou...

Find SimilarView on arXiv

Entropy regularization in probabilistic clustering

July 19, 2023

90% Match
Beatrice Franzolini, Giovanni Rebaudo
Methodology
Computation
Machine Learning

Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalize...

Find SimilarView on arXiv

Bayesian model-based clustering for populations of network data

July 7, 2021

90% Match
Anastasia Mantziou, Simon Lunagomez, Robin Mitra
Applications

There is increasing appetite for analysing populations of network data due to the fast-growing body of applications demanding such methods. While methods exist to provide readily interpretable summaries of heterogeneous network populations, these are often descriptive or ad hoc, lacking any formal justification. In contrast, principled analysis methods often provide results difficult to relate back to the applied problem of interest. Motivated by two complementary applied exa...

Find SimilarView on arXiv

Optimal Bayesian estimators for latent variable cluster models

July 8, 2016

90% Match
Riccardo Rastelli, Nial Friel
Methodology

In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical n...

Find SimilarView on arXiv

Optimal Bayesian clustering using non-negative matrix factorization

September 20, 2018

90% Match
Ketong Wang, Michael D. Porter
Methodology

Bayesian model-based clustering is a widely applied procedure for discovering groups of related observations in a dataset. These approaches use Bayesian mixture models, estimated with MCMC, which provide posterior samples of the model parameters and clustering partition. While inference on model parameters is well established, inference on the clustering partition is less developed. A new method is developed for estimating the optimal partition from the pairwise posterior sim...

Find SimilarView on arXiv