April 24, 2012
Similar papers 4
June 22, 2016
In the framework of Bayesian model-based clustering based on a finite mixture of Gaussian distributions, we present a joint approach to estimate the number of mixture components and identify cluster-relevant variables simultaneously as well as to obtain an identified model. Our approach consists in specifying sparse hierarchical priors on the mixture weights and component means. In a deliberately overfitting mixture model the sparse prior on the weights empties superfluous co...
December 9, 2022
A model-based approach is developed for clustering categorical data with no natural ordering. The proposed method exploits the Hamming distance to define a family of probability mass functions to model the data. The elements of this family are then considered as kernels of a finite mixture model with unknown number of components. Conjugate Bayesian inference has been derived for the parameters of the Hamming distribution model. The mixture is framed in a Bayesian nonparametri...
May 21, 2018
Finite mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We propose a model in which a small number of observatio...
February 18, 2015
This paper proposes solutions to three issues pertaining to the estimation of finite mixture models with an unknown number of components: the non-identifiability induced by overfitting the number of components, the mixing limitations of standard Markov Chain Monte Carlo (MCMC) sampling techniques, and the related label switching problem. An overfitting approach is used to estimate the number of components in a finite mixture model via a Zmix algorithm. Zmix provides a bridge ...
July 9, 2020
We consider the problem of clustering with $K$-means and Gaussian mixture models with a constraint on the separation between the centers in the context of real-valued data. We first propose a dynamic programming approach to solving the $K$-means problem with a separation constraint on the centers, building on (Wang and Song, 2011). In the context of fitting a Gaussian mixture model, we then propose an EM algorithm that incorporates such a constraint. A separation constraint c...
July 20, 2018
We propose a prior distribution for the number of components of a finite mixture model. The novelty is that the prior distribution is obtained by considering the loss one would incur if the true value representing the number of components were not considered. The prior has an elegant and easy to implement structure, which allows to naturally include any prior information one may have as well as to opt for a default solution in cases where this information is not available. Th...
June 22, 2017
In model-based-clustering mixture models are used to group data points into clusters. A useful concept introduced for Gaussian mixtures by Malsiner Walli et al (2016) are sparse finite mixtures, where the prior distribution on the weight distribution of a mixture with $K$ components is chosen in such a way that a priori the number of clusters in the data is random and is allowed to be smaller than $K$ with high probability. The number of cluster is then inferred a posteriori ...
January 14, 2015
The parsimonious Gaussian mixture models, which exploit an eigenvalue decomposition of the group covariance matrices of the Gaussian mixture, have shown their success in particular in cluster analysis. Their estimation is in general performed by maximum likelihood estimation and has also been considered from a parametric Bayesian prospective. We propose new Dirichlet Process Parsimonious mixtures (DPPM) which represent a Bayesian nonparametric formulation of these parsimoniou...
November 29, 2013
Discrete mixture models are one of the most successful approaches for density estimation. Under a Bayesian nonparametric framework, Dirichlet process location-scale mixture of Gaussian kernels is the golden standard, both having nice theoretical properties and computational tractability. In this paper we explore the use of the skew-normal kernel, which can naturally accommodate several degrees of skewness by the use of a third parameter. The choice of this kernel function all...
August 11, 2015
This paper presents a new Markov chain Monte Carlo method to sample from the posterior distribution of conjugate mixture models. This algorithm relies on a flexible split-merge procedure built using the particle Gibbs sampler. Contrary to available split-merge procedures, the resulting so-called Particle Gibbs Split-Merge sampler does not require the computation of a complex acceptance ratio, is simple to implement using existing sequential Monte Carlo libraries and can be pa...