May 13, 2015
Similar papers 3
May 10, 2021
We propose a randomized greedy search algorithm to find a point estimate for a random partition based on a loss function and posterior Monte Carlo samples. Given the large size and awkward discrete nature of the search space, the minimization of the posterior expected loss is challenging. Our approach is a stochastic search based on a series of greedy optimizations performed in a random order and is embarrassingly parallel. We consider several loss functions, including Binder...
January 14, 2015
The parsimonious Gaussian mixture models, which exploit an eigenvalue decomposition of the group covariance matrices of the Gaussian mixture, have shown their success in particular in cluster analysis. Their estimation is in general performed by maximum likelihood estimation and has also been considered from a parametric Bayesian prospective. We propose new Dirichlet Process Parsimonious mixtures (DPPM) which represent a Bayesian nonparametric formulation of these parsimoniou...
February 24, 2023
Clustering is a well-known and studied problem, one of its variants, called contiguity-constrained clustering, accepts as a second input a graph used to encode prior information about cluster structure by means of contiguity constraints i.e. clusters must form connected subgraphs of this graph. This paper discusses the interest of such a setting and proposes a new way to formalise it in a Bayesian setting, using results on spanning trees to compute exactly a posteriori probab...
December 22, 2020
Cluster analysis aims at partitioning data into groups or clusters. In applications, it is common to deal with problems where the number of clusters is unknown. Bayesian mixture models employed in such applications usually specify a flexible prior that takes into account the uncertainty with respect to the number of clusters. However, a major empirical challenge involving the use of these models is in the characterisation of the induced prior on the partitions. This work intr...
November 30, 2011
Landscape classification of the well-known biodiversity hotspot, Western Ghats (mountains), on the west coast of India, is an important part of a world-wide program of monitoring biodiversity. To this end, a massive vegetation data set, consisting of 51,834 4-variate observations has been clustered into different landscapes by Nagendra and Gadgil [Current Sci. 75 (1998) 264--271]. But a study of such importance may be affected by nonuniqueness of cluster analysis and the lack...
February 10, 2016
Clustering is a powerful tool in data analysis, but it is often difficult to find a grouping that aligns with a user's needs. To address this, several methods incorporate constraints obtained from users into clustering algorithms, but unfortunately do not apply to hierarchical clustering. We design an interactive Bayesian algorithm that incorporates user interaction into hierarchical clustering while still utilizing the geometry of the data by sampling a constrained posterior...
March 13, 2013
The analysis of decision making under uncertainty is closely related to the analysis of probabilistic inference. Indeed, much of the research into efficient methods for probabilistic inference in expert systems has been motivated by the fundamental normative arguments of decision theory. In this paper we show how the developments underlying those efficient methods can be applied immediately to decision problems. In addition to general approaches which need know nothing about ...
August 10, 2020
Maximum likelihood estimates (MLEs) are asymptotically normally distributed, and this property is used in meta-analyses to test the heterogeneity of estimates, either for a single cluster or for several sub-groups. More recently, MLEs for associations between risk factors and diseases have been hierarchically clustered to search for diseases with shared underlying causes, but an objective statistical criterion is needed to determine the number and composition of clusters. To ...
September 9, 2022
Various methods have been developed to combine inference across multiple sets of results for unsupervised clustering, within the ensemble clustering literature. The approach of reporting results from one `best' model out of several candidate clustering models generally ignores the uncertainty that arises from model selection, and results in inferences that are sensitive to the particular model and parameters chosen. Bayesian model averaging (BMA) is a popular approach for com...
March 31, 2020
In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering. Existing distributed clustering algorithms are mostly distance or density based without a likelihood specification, precluding the possibility of formal statistical inference. Model-based clustering allows statistical inference, yet research on...