ID: 2303.17182

A review on Bayesian model-based clustering

March 30, 2023

View on ArXiv

Similar papers 2

Dependent Modeling of Temporal Sequences of Random Partitions

December 24, 2019

90% Match
Garritt L. Page, Fernando A. Quintana, David B. Dahl
Methodology

We consider the task of modeling a dependent sequence of random partitions. It is well-known that a random measure in Bayesian nonparametrics induces a distribution over random partitions. The community has therefore assumed that the best approach to obtain a dependent sequence of random partitions is through modeling dependent random measures. We argue that this approach is problematic and show that the random partition model induced by dependent Bayesian nonparametric prior...

Find SimilarView on arXiv

Bayesian Distance Clustering

October 19, 2018

90% Match
Leo L Duan, David B Dunson
Machine Learning
Machine Learning

Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some inf...

Find SimilarView on arXiv

Approximate Inference via Clustering

November 28, 2021

90% Match
Qianqian Song
Machine Learning
Machine Learning

In recent years, large-scale Bayesian learning draws a great deal of attention. However, in big-data era, the amount of data we face is growing much faster than our ability to deal with it. Fortunately, it is observed that large-scale datasets usually own rich internal structure and is somewhat redundant. In this paper, we attempt to simplify the Bayesian posterior via exploiting this structure. Specifically, we restrict our interest to the so-called well-clustered datasets a...

Find SimilarView on arXiv

Spying on the prior of the number of data clusters and the partition distribution in Bayesian cluster analysis

December 22, 2020

89% Match
Jan Greve, Bettina Grün, ... , Frühwirth-Schnatter Sylvia
Methodology

Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work intr...

Find SimilarView on arXiv

Bayesian Cluster Enumeration Criterion for Unsupervised Learning

October 22, 2017

89% Match
Freweyni K. Teklehaymanot, Michael Muma, Abdelhak M. Zoubir
Statistics Theory
Machine Learning
Machine Learning
Statistics Theory

We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for m...

Find SimilarView on arXiv

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

July 3, 2013

89% Match
Ji Won Yoon
Machine Learning
Machine Learning

In order to cluster or partition data, we often use Expectation-and-Maximization (EM) or Variational approximation with a Gaussian Mixture Model (GMM), which is a parametric probability density function represented as a weighted sum of $\hat{K}$ Gaussian component densities. However, model selection to find underlying $\hat{K}$ is one of the key concerns in GMM clustering, since we can obtain the desired clusters only when $\hat{K}$ is known. In this paper, we propose a new m...

Find SimilarView on arXiv

Bayesian Finite Mixture Models

July 7, 2024

89% Match
Bettina Grün, Gertraud Malsiner-Walli
Methodology

Finite mixture models are a useful statistical model class for clustering and density approximation. In the Bayesian framework finite mixture models require the specification of suitable priors in addition to the data model. These priors allow to avoid spurious results and provide a principled way to define cluster shapes and a preference for specific cluster solutions. A generic model estimation scheme for finite mixtures with a fixed number of components is available using ...

Find SimilarView on arXiv

Mean-field theory of Bayesian clustering

September 6, 2017

89% Match
Alexander Mozeika, Anthony CC Coolen
Disordered Systems and Neura...
Data Analysis, Statistics an...

We show that model-based Bayesian clustering, the probabilistically most systematic approach to the partitioning of data, can be mapped into a statistical physics problem for a gas of particles, and as a result becomes amenable to a detailed quantitative analysis. A central role in the resulting statistical physics framework is played by an entropy function. We demonstrate that there is a relevant parameter regime where mean-field analysis of this function is exact, and that,...

Find SimilarView on arXiv

Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model

June 15, 2018

89% Match
Daniel Andrade, Akiko Takeda, Kenji Fukumizu
Applications
Computation
Machine Learning

Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to noise can dramatically change the clustering result when evaluating for example with the Bayesian Information Criteria (BIC). In this work, we try to address this issue by proposing a Bayesian model that accounts for negligible small, but no...

Find SimilarView on arXiv

A Tutorial on Bayesian Nonparametric Models

June 14, 2011

89% Match
Samuel J. Gershman, David M. Blei
Machine Learning
Methodology

A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number ofclusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduc...

Find SimilarView on arXiv