April 24, 2012
Similar papers 3
September 22, 2016
The BayesBinMix package offers a Bayesian framework for clustering binary data with or without missing values by fitting mixtures of multivariate Bernoulli distributions with an unknown number of components. It allows the joint estimation of the number of clusters and model parameters using Markov chain Monte Carlo sampling. Heated chains are run in parallel and accelerate the convergence to the target posterior distribution. Identifiability issues are addressed by implementi...
January 18, 2022
The Bayesian approach to inference stands out for naturally allowing borrowing information across heterogeneous populations, with different samples possibly sharing the same distribution. A popular Bayesian nonparametric model for clustering probability distributions is the nested Dirichlet process, which however has the drawback of grouping distributions in a single cluster when ties are observed across samples. With the goal of achieving a flexible and effective clustering ...
May 6, 2020
Non-Gaussian mixture models are gaining increasing attention for mixture model-based clustering particularly when dealing with data that exhibit features such as skewness and heavy tails. Here, such a mixture distribution is presented, based on the multivariate normal inverse Gaussian (MNIG) distribution. For parameter estimation of the mixture, a Bayesian approach via Gibbs sampler is used; for this, a novel approach to simulate univariate generalized inverse Gaussian random...
April 19, 2022
In the realm of unsupervised learning, Bayesian nonparametric mixture models, exemplified by the Dirichlet Process Mixture Model (DPMM), provide a principled approach for adapting the complexity of the model to the data. Such models are particularly useful in clustering tasks where the number of clusters is unknown. Despite their potential and mathematical elegance, however, DPMMs have yet to become a mainstream tool widely adopted by practitioners. This is arguably due to a ...
February 17, 2023
The study of almost surely discrete random probability measures is an active line of research in Bayesian nonparametrics. The idea of assuming interaction across the atoms of the random probability measure has recently spurred significant interest in the context of Bayesian mixture models. This allows the definition of priors that encourage well separated and interpretable clusters. In this work, we provide a unified framework for the construction and the Bayesian analysis of...
November 30, 2018
We propose a unifying view of two different Bayesian inference algorithms, Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) and Stein Variational Gradient Descent (SVGD), leading to improved and efficient novel sampling schemes. We show that SVGD combined with a noise term can be framed as a multiple chain SG-MCMC method. Instead of treating each parallel chain independently from others, our proposed algorithm implements a repulsive force between particles, avoiding col...
March 14, 2017
The goal of data clustering is to partition data points into groups to minimize a given objective function. While most existing clustering algorithms treat each data point as vector, in many applications each datum is not a vector but a point pattern or a set of points. Moreover, many existing clustering methods require the user to specify the number of clusters, which is not available in advance. This paper proposes a new class of models for data clustering that addresses se...
May 17, 2022
We describe BayesMix, a C++ library for MCMC posterior simulation for general Bayesian mixture models. The goal of BayesMix is to provide a self-contained ecosystem to perform inference for mixture models to computer scientists, statisticians and practitioners. The key idea of this library is extensibility, as we wish the users to easily adapt our software to their specific Bayesian mixture models. In addition to the several models and MCMC algorithms for posterior inference ...
January 20, 2019
This paper is a step-by-step tutorial for fitting a mixture distribution to data. It merely assumes the reader has the background of calculus and linear algebra. Other required background is briefly reviewed before explaining the main algorithm. In explaining the main algorithm, first, fitting a mixture of two distributions is detailed and examples of fitting two Gaussians and Poissons, respectively for continuous and discrete cases, are introduced. Thereafter, fitting severa...
February 22, 2015
A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with Dirichlet weights, and put a prior on the number of components---that is, to use a mixture of finite mixtures (MFM). While inference in MFMs can be done with methods such as reversible jump Markov chain Monte Carlo, it is much more common to use Dirichlet process mixture (DPM) models because of the relative ease and generality with which DPM sampl...