ID: 2107.05414

Cohesion and Repulsion in Bayesian Distance Clustering

July 12, 2021

View on ArXiv

Similar papers 2

Interpretable Clustering with the Distinguishability Criterion

April 24, 2024

86% Match
Ali Turfah, Xiaoquan Wen
Machine Learning
Machine Learning
Methodology

Cluster analysis is a popular unsupervised learning tool used in many disciplines to identify heterogeneous sub-populations within a sample. However, validating cluster analysis results and determining the number of clusters in a data set remains an outstanding problem. In this work, we present a global criterion called the Distinguishability criterion to quantify the separability of identified clusters and validate inferred cluster configurations. Our computational implement...

Find SimilarView on arXiv

Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

January 29, 2015

85% Match
Juho Lee, Seungjin Choi
Machine Learning
Machine Learning

Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formul...

Find SimilarView on arXiv

Learning Generative Models of Similarity Matrices

October 19, 2012

85% Match
Romer Rosales, Brendan J. Frey
Machine Learning
Machine Learning

We describe a probabilistic (generative) view of affinity matrices along with inference algorithms for a subclass of problems associated with data clustering. This probabilistic view is helpful in understanding different models and algorithms that are based on affinity functions OF the data. IN particular, we show how(greedy) inference FOR a specific probabilistic model IS equivalent TO the spectral clustering algorithm.It also provides a framework FOR developing new algorith...

Find SimilarView on arXiv

Mathematical Foundations of Data Cohesion

August 1, 2023

85% Match
Katherine E. Moore
Social and Information Netwo...
Discrete Mathematics

Data cohesion, a recently introduced measure inspired by social interactions, uses distance comparisons to assess relative proximity. In this work, we provide a collection of results which can guide the development of cohesion-based methods in exploratory data analysis and human-aided computation. Here, we observe the important role of highly clustered "point-like" sets and the ways in which cohesion allows such sets to take on qualities of a single weighted point. In doing s...

Find SimilarView on arXiv

Dimensionality's Blessing: Clustering Images by Underlying Distribution

April 8, 2018

85% Match
Wen-Yan Lin, Siying Liu, ... , Matsushita Yasuyuki
Computer Vision and Pattern ...

Many high dimensional vector distances tend to a constant. This is typically considered a negative "contrast-loss" phenomenon that hinders clustering and other machine learning techniques. We reinterpret "contrast-loss" as a blessing. Re-deriving "contrast-loss" using the law of large numbers, we show it results in a distribution's instances concentrating on a thin "hyper-shell". The hollow center means apparently chaotically overlapping distributions are actually intrinsical...

Find SimilarView on arXiv

Repulsive Mixtures

April 24, 2012

85% Match
Francesca Petralia, Vinayak Rao, David B. Dunson
Methodology

Discrete mixture models are routinely used for density estimation and clustering. While conducting inferences on the cluster-specific parameters, current frequentist and Bayesian methods often encounter problems when clusters are placed too close together to be scientifically meaningful. Current Bayesian practice generates component-specific parameters independently from a common prior, which tends to favor similar components and often leads to substantial probability assigne...

Find SimilarView on arXiv

Bayesian Level-Set Clustering

March 7, 2024

85% Match
David Buch, Miheer Dewaskar, David B. Dunson
Methodology

Broadly, the goal when clustering data is to separate observations into meaningful subgroups. The rich variety of methods for clustering reflects the fact that the relevant notion of meaningful clusters varies across applications. The classical Bayesian approach clusters observations by their association with components of a mixture model; the choice in class of components allows flexibility to capture a range of meaningful cluster notions. However, in practice the range is s...

Find SimilarView on arXiv

A Probabilistic $\ell_1$ Method for Clustering High Dimensional Data

April 6, 2015

85% Match
Tsvetan Asamov, Adi Ben-Israel
Statistics Theory
Machine Learning
Optimization and Control
Machine Learning
Statistics Theory

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a distance-based iterative method for clustering data in very high-dimensional space, using the $\ell_1$-metric that is less sensitive to high dimensionality than the Euclid...

Find SimilarView on arXiv

Bayesian Repulsive Gaussian Mixture Model

March 27, 2017

85% Match
Fangzheng Xie, Yanxun Xu
Methodology

We develop a general class of Bayesian repulsive Gaussian mixture models that encourage well-separated clusters, aiming at reducing potentially redundant components produced by independent priors for locations (such as the Dirichlet process). The asymptotic results for the posterior distribution of the proposed models are derived, including posterior consistency and posterior contraction rate in the context of nonparametric density estimation. More importantly, we show that c...

Find SimilarView on arXiv

Dirichlet Process Parsimonious Mixtures for clustering

January 14, 2015

85% Match
Faicel Chamroukhi, Marius Bartcus, Hervé Glotin
Machine Learning
Machine Learning
Methodology

The parsimonious Gaussian mixture models, which exploit an eigenvalue decomposition of the group covariance matrices of the Gaussian mixture, have shown their success in particular in cluster analysis. Their estimation is in general performed by maximum likelihood estimation and has also been considered from a parametric Bayesian prospective. We propose new Dirichlet Process Parsimonious mixtures (DPPM) which represent a Bayesian nonparametric formulation of these parsimoniou...

Find SimilarView on arXiv