March 30, 2023
Similar papers 2
December 24, 2019
We consider the task of modeling a dependent sequence of random partitions. It is well-known that a random measure in Bayesian nonparametrics induces a distribution over random partitions. The community has therefore assumed that the best approach to obtain a dependent sequence of random partitions is through modeling dependent random measures. We argue that this approach is problematic and show that the random partition model induced by dependent Bayesian nonparametric prior...
October 19, 2018
Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some inf...
November 28, 2021
In recent years, large-scale Bayesian learning draws a great deal of attention. However, in big-data era, the amount of data we face is growing much faster than our ability to deal with it. Fortunately, it is observed that large-scale datasets usually own rich internal structure and is somewhat redundant. In this paper, we attempt to simplify the Bayesian posterior via exploiting this structure. Specifically, we restrict our interest to the so-called well-clustered datasets a...
December 22, 2020
Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work intr...
October 22, 2017
We derive a new Bayesian Information Criterion (BIC) by formulating the problem of estimating the number of clusters in an observed data set as maximization of the posterior probability of the candidate models. Given that some mild assumptions are satisfied, we provide a general BIC expression for a broad class of data distributions. This serves as a starting point when deriving the BIC for specific distributions. Along this line, we provide a closed-form BIC expression for m...
July 3, 2013
In order to cluster or partition data, we often use Expectation-and-Maximization (EM) or Variational approximation with a Gaussian Mixture Model (GMM), which is a parametric probability density function represented as a weighted sum of $\hat{K}$ Gaussian component densities. However, model selection to find underlying $\hat{K}$ is one of the key concerns in GMM clustering, since we can obtain the desired clusters only when $\hat{K}$ is known. In this paper, we propose a new m...
July 7, 2024
Finite mixture models are a useful statistical model class for clustering and density approximation. In the Bayesian framework finite mixture models require the specification of suitable priors in addition to the data model. These priors allow to avoid spurious results and provide a principled way to define cluster shapes and a preference for specific cluster solutions. A generic model estimation scheme for finite mixtures with a fixed number of components is available using ...
September 6, 2017
We show that model-based Bayesian clustering, the probabilistically most systematic approach to the partitioning of data, can be mapped into a statistical physics problem for a gas of particles, and as a result becomes amenable to a detailed quantitative analysis. A central role in the resulting statistical physics framework is played by an entropy function. We demonstrate that there is a relevant parameter regime where mean-field analysis of this function is exact, and that,...
June 15, 2018
Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to noise can dramatically change the clustering result when evaluating for example with the Bayesian Information Criteria (BIC). In this work, we try to address this issue by proposing a Bayesian model that accounts for negligible small, but no...
June 14, 2011
A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number ofclusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduc...