September 20, 2018
Bayesian model-based clustering is a widely applied procedure for discovering groups of related observations in a dataset. These approaches use Bayesian mixture models, estimated with MCMC, which provide posterior samples of the model parameters and clustering partition. While inference on model parameters is well established, inference on the clustering partition is less developed. A new method is developed for estimating the optimal partition from the pairwise posterior similarity matrix generated by a Bayesian cluster model. This approach uses non-negative matrix factorization (NMF) to provide a low-rank approximation to the similarity matrix. The factorization permits hard or soft partitions and is shown to perform better than several popular alternatives under a variety of penalty functions.
Similar papers 1
July 8, 2016
In cluster analysis interest lies in probabilistically capturing partitions of individuals, items or observations into groups, such that those belonging to the same group share similar attributes or relational profiles. Bayesian posterior samples for the latent allocation variables can be effectively obtained in a wide range of clustering models, including finite mixtures, infinite mixtures, hidden Markov models and block models for networks. However, due to the categorical n...
March 16, 2018
Bayesian Non-negative Matrix Factorization (NMF) is a promising approach for understanding uncertainty and structure in matrix data. However, a large volume of applied work optimizes traditional non-Bayesian NMF objectives that fail to provide a principled understanding of the non-identifiability inherent in NMF-- an issue ideally addressed by a Bayesian approach. Despite their suitability, current Bayesian NMF approaches have failed to gain popularity in an applied setting; ...
September 14, 2010
Identifying overlapping communities in networks is a challenging task. In this work we present a novel approach to community detection that utilises the Bayesian non-negative matrix factorisation (NMF) model to produce a probabilistic output for node memberships. The scheme has the advantage of computational efficiency, soft community membership and an intuitive foundation. We present the performance of the method against a variety of benchmark problems and compare and contra...
March 30, 2023
Clustering is an important task in many areas of knowledge: medicine and epidemiology, genomics, environmental science, economics, visual sciences, among others. Methodologies to perform inference on the number of clusters have often been proved to be inconsistent, and introducing a dependence structure among the clusters implies additional difficulties in the estimation process. In a Bayesian setting, clustering is performed by considering the unknown partition as a random o...
July 12, 2015
Nonnegative Matrix Factorization (NMF) was first introduced as a low-rank matrix approximation technique, and has enjoyed a wide area of applications. Although NMF does not seem related to the clustering problem at first, it was shown that they are closely linked. In this report, we provide a gentle introduction to clustering and NMF before reviewing the theoretical relationship between them. We then explore several NMF variants, namely Sparse NMF, Projective NMF, Nonnegative...
July 19, 2023
Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalize...
November 2, 2011
Bayesian models offer great flexibility for clustering applications---Bayesian nonparametrics can be used for modeling infinite mixtures, and hierarchical Bayesian models can be utilized for sharing clusters across multiple data sets. For the most part, such flexibility is lacking in classical clustering methods such as k-means. In this paper, we revisit the k-means clustering algorithm from a Bayesian nonparametric viewpoint. Inspired by the asymptotic connection between k-m...
January 14, 2015
The parsimonious Gaussian mixture models, which exploit an eigenvalue decomposition of the group covariance matrices of the Gaussian mixture, have shown their success in particular in cluster analysis. Their estimation is in general performed by maximum likelihood estimation and has also been considered from a parametric Bayesian prospective. We propose new Dirichlet Process Parsimonious mixtures (DPPM) which represent a Bayesian nonparametric formulation of these parsimoniou...
July 7, 2024
Finite mixture models are a useful statistical model class for clustering and density approximation. In the Bayesian framework finite mixture models require the specification of suitable priors in addition to the data model. These priors allow to avoid spurious results and provide a principled way to define cluster shapes and a preference for specific cluster solutions. A generic model estimation scheme for finite mixtures with a fixed number of components is available using ...
January 17, 2017
Recent advances on overfitting Bayesian mixture models provide a solid and straightforward approach for inferring the underlying number of clusters and model parameters in heterogeneous datasets. The applicability of such a framework in clustering correlated high dimensional data is demonstrated. For this purpose an overfitting mixture of factor analyzers is introduced, assuming that the number of factors is fixed. A Markov chain Monte Carlo (MCMC) sampler combined with a pri...