Bayesian cluster analysis: Point estimation and credible balls

May 13, 2015

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

July 3, 2013

87% Match

Ji Won Yoon

Machine Learning

In order to cluster or partition data, we often use Expectation-and-Maximization (EM) or Variational approximation with a Gaussian Mixture Model (GMM), which is a parametric probability density function represented as a weighted sum of $\hat{K}$ Gaussian component densities. However, model selection to find underlying $\hat{K}$ is one of the key concerns in GMM clustering, since we can obtain the desired clusters only when $\hat{K}$ is known. In this paper, we propose a new m...

Find SimilarView on arXiv

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

February 22, 2016

87% Match

Akash Srivastava, James Zou, Charles Sutton

Machine Learning

A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria. These criteria can be difficult to formalize, even when it is easy for an analyst to know a good clustering when she sees one. We present a new approach to interactive clustering for data exploration, called \ciif, based on a particularly simple feedback mechanism, in which an analyst can choose...

Find SimilarView on arXiv

A Quasi-Bayesian Perspective to Online Clustering

February 1, 2016

87% Match

Le Li, Benjamin Guedj, Sébastien Loustau

Machine Learning

Statistics Theory

When faced with high frequency streams of data, clustering raises theoretical and algorithmic pitfalls. We introduce a new and adaptive online clustering algorithm relying on a quasi-Bayesian approach, with a dynamic (i.e., time-dependent) estimation of the (unknown and changing) number of clusters. We prove that our approach is supported by minimax regret bounds. We also provide an RJMCMC-flavored implementation (called PACBO, see https://cran.r-project.org/web/packages/PACB...

Find SimilarView on arXiv

Robust and Scalable Bayes via a Median of Subset Posterior Measures

March 11, 2014

87% Match

Stanislav Minsker, Sanvesh Srivastava, ... , Dunson David B.

Statistics Theory

Distributed, Parallel, and C...

Machine Learning

Statistics Theory

We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed aggregation step, which is based on the evaluation of a median in the space of pr...

Find SimilarView on arXiv

Distribution free optimality intervals for clustering

July 30, 2021

87% Match

Marina Meilă, Hanyu Zhang

Machine Learning

We address the problem of validating the ouput of clustering algorithms. Given data $\mathcal{D}$ and a partition $\mathcal{C}$ of these data into $K$ clusters, when can we say that the clusters obtained are correct or meaningful for the data? This paper introduces a paradigm in which a clustering $\mathcal{C}$ is considered meaningful if it is good with respect to a loss function such as the K-means distortion, and stable, i.e. the only good clustering up to small perturbati...

Find SimilarView on arXiv

Interpretable Clustering with the Distinguishability Criterion

April 24, 2024

87% Match

Ali Turfah, Xiaoquan Wen

Machine Learning

Methodology

Cluster analysis is a popular unsupervised learning tool used in many disciplines to identify heterogeneous sub-populations within a sample. However, validating cluster analysis results and determining the number of clusters in a data set remains an outstanding problem. In this work, we present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations. Our computational implement...

Find SimilarView on arXiv

A multiscale Bayesian nonparametric framework for partial hierarchical clustering

June 28, 2024

87% Match

Lorenzo Schiavon, Mattia Stival

Methodology

In recent years, there has been a growing demand to discern clusters of subjects in datasets characterized by a large set of features. Often, these clusters may be highly variable in size and present partial hierarchical structures. In this context, model-based clustering approaches with nonparametric priors are gaining attention in the literature due to their flexibility and adaptability to new data. However, current approaches still face challenges in recognizing hierarchic...

Find SimilarView on arXiv

Robust Bayesian Cluster Enumeration Based on the $t$ Distribution

November 29, 2018

87% Match

Freweyni K. Teklehaymanot, Michael Muma, Abdelhak M. Zoubir

Machine Learning

A major challenge in cluster analysis is that the number of data clusters is mostly unknown and it must be estimated prior to clustering the observed data. In real-world applications, the observed data is often subject to heavy tailed noise and outliers which obscure the true underlying structure of the data. Consequently, estimating the number of clusters becomes challenging. To this end, we derive a robust cluster enumeration criterion by formulating the problem of estimati...

Find SimilarView on arXiv

Bayesian Consensus Clustering

February 28, 2013

87% Match

Eric F. Lock, David B. Dunson

Machine Learning

The task of clustering a set of objects based on multiple sources of data arises in several modern applications. We propose an integrative statistical model that permits a separate clustering of the objects for each data source. These separate clusterings adhere loosely to an overall consensus clustering, and hence they are not independent. We describe a computationally scalable Bayesian framework for simultaneous estimation of both the consensus clustering and the source-spe...

Find SimilarView on arXiv

Spectral Clustering, Bayesian Spanning Forest, and Forest Process

February 1, 2022

87% Match

Leo L. Duan, Arkaprava Roy

Methodology

Spectral clustering views the similarity matrix as a weighted graph, and partitions the data by minimizing a graph-cut loss. Since it minimizes the across-cluster similarity, there is no need to model the distribution within each cluster. As a result, one reduces the chance of model misspecification, which is often a risk in mixture model-based clustering. Nevertheless, compared to the latter, spectral clustering has no direct ways of quantifying the clustering uncertainty (s...

Find SimilarView on arXiv