June 9, 2020
Loss-based clustering methods, such as k-means and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative, but such methods face computational problems and large sensitivity to the choice of kernel. This article proposes a generalized Bayes framework that bridges between these two paradigms through the use...
October 25, 2022
Bayesian nonparametric mixture models are common for modeling complex data. While these models are well-suited for density estimation, their application for clustering has some limitations. Miller and Harrison (2014) proved posterior inconsistency in the number of clusters when the true number of clusters is finite for Dirichlet process and Pitman--Yor process mixture models. In this work, we extend this result to additional Bayesian nonparametric priors such as Gibbs-type pr...
July 19, 2023
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalize...
June 28, 2024
In recent years, there has been a growing demand to discern clusters of subjects in datasets characterized by a large set of features. Often, these clusters may be highly variable in size and present partial hierarchical structures. In this context, model-based clustering approaches with nonparametric priors are gaining attention in the literature due to their flexibility and adaptability to new data. However, current approaches still face challenges in recognizing hierarchic...
August 1, 2023
Finite mixture models are flexible methods that are commonly used for model-based clustering. A recent focus in the model-based clustering literature is to highlight the difference between the number of components in a mixture model and the number of clusters. The number of clusters is more relevant from a practical stand point, but to date, the focus of prior distribution formulation has been on the number of components. In light of this, we develop a finite mixture methodol...
March 31, 2020
In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering. Existing distributed clustering algorithms are mostly distance or density based without a likelihood specification, precluding the possibility of formal statistical inference. Model-based clustering allows statistical inference, yet research on...
July 16, 2015
This paper adopts a Bayesian nonparametric mixture model where the mixing distribution belongs to the wide class of normalized homogeneous completely random measures. We propose a truncation method for the mixing distribution by discarding the weights of the unnormalized measure smaller than a threshold. We prove convergence in law of our approximation, provide some theoretical properties and characterize its posterior distribution so that a blocked Gibbs sampler is devised. ...
December 18, 2018
This chapter surveys the most standard Monte Carlo methods available for simulating from a posterior distribution associated with a mixture and conducts some experiments about the robustness of the Gibbs sampler in high dimensional Gaussian settings. This is a chapter prepared for the forthcoming 'Handbook of Mixture Analysis'.
August 23, 2017
Analysis of a Bayesian mixture model for the Matrix Langevin distribution on the Stiefel manifold is presented. The model exploits a particular parametrization of the Matrix Langevin distribution, various aspects of which are elaborated on. A general, and novel, family of conjugate priors, and an efficient Markov chain Monte Carlo (MCMC) sampling scheme for the corresponding posteriors is then developed for the mixture model. Theoretical properties of the prior and posterior ...
April 1, 2016
Choosing the number of mixture components remains an elusive challenge. Model selection criteria can be either overly liberal or conservative and return poorly-separated components of limited practical use. We formalize non-local priors (NLPs) for mixtures and show how they lead to well-separated components with non-negligible weight, interpretable as distinct subpopulations. We also propose an estimator for posterior model probabilities under local and non-local priors, show...