Cohesion and Repulsion in Bayesian Distance Clustering

July 12, 2021

Bayesian Hierarchical Clustering with Exponential Family: Small-Variance Asymptotics and Reducibility

January 29, 2015

85% Match

Juho Lee, Seungjin Choi

Machine Learning

Bayesian hierarchical clustering (BHC) is an agglomerative clustering method, where a probabilistic model is defined and its marginal likelihoods are evaluated to decide which clusters to merge. While BHC provides a few advantages over traditional distance-based agglomerative clustering algorithms, successive evaluation of marginal likelihoods and careful hyperparameter tuning are cumbersome and limit the scalability. In this paper we relax BHC into a non-probabilistic formul...

Find SimilarView on arXiv

Learning Generative Models of Similarity Matrices

October 19, 2012

85% Match

Romer Rosales, Brendan J. Frey

Machine Learning

We describe a probabilistic (generative) view of affinity matrices along with inference algorithms for a subclass of problems associated with data clustering. This probabilistic view is helpful in understanding different models and algorithms that are based on affinity functions OF the data. IN particular, we show how(greedy) inference FOR a specific probabilistic model IS equivalent TO the spectral clustering algorithm.It also provides a framework FOR developing new algorith...

Find SimilarView on arXiv

Mathematical Foundations of Data Cohesion

August 1, 2023

85% Match

Katherine E. Moore

Social and Information Netwo...

Discrete Mathematics

Data cohesion, a recently introduced measure inspired by social interactions, uses distance comparisons to assess relative proximity. In this work, we provide a collection of results which can guide the development of cohesion-based methods in exploratory data analysis and human-aided computation. Here, we observe the important role of highly clustered "point-like" sets and the ways in which cohesion allows such sets to take on qualities of a single weighted point. In doing s...

Find SimilarView on arXiv

Dimensionality's Blessing: Clustering Images by Underlying Distribution

April 8, 2018

85% Match

Wen-Yan Lin, Siying Liu, ... , Matsushita Yasuyuki

Computer Vision and Pattern ...

Many high dimensional vector distances tend to a constant. This is typically considered a negative "contrast-loss" phenomenon that hinders clustering and other machine learning techniques. We reinterpret "contrast-loss" as a blessing. Re-deriving "contrast-loss" using the law of large numbers, we show it results in a distribution's instances concentrating on a thin "hyper-shell". The hollow center means apparently chaotically overlapping distributions are actually intrinsical...

Find SimilarView on arXiv

Repulsive Mixtures

April 24, 2012

85% Match

Francesca Petralia, Vinayak Rao, David B. Dunson

Methodology

Discrete mixture models are routinely used for density estimation and clustering. While conducting inferences on the cluster-specific parameters, current frequentist and Bayesian methods often encounter problems when clusters are placed too close together to be scientifically meaningful. Current Bayesian practice generates component-specific parameters independently from a common prior, which tends to favor similar components and often leads to substantial probability assigne...

Find SimilarView on arXiv

Bayesian Level-Set Clustering

March 7, 2024

85% Match

David Buch, Miheer Dewaskar, David B. Dunson

Methodology

Broadly, the goal when clustering data is to separate observations into meaningful subgroups. The rich variety of methods for clustering reflects the fact that the relevant notion of meaningful clusters varies across applications. The classical Bayesian approach clusters observations by their association with components of a mixture model; the choice in class of components allows flexibility to capture a range of meaningful cluster notions. However, in practice the range is s...

Find SimilarView on arXiv

A Probabilistic $\ell_1$ Method for Clustering High Dimensional Data

April 6, 2015

85% Match

Tsvetan Asamov, Adi Ben-Israel

Statistics Theory

Machine Learning

Optimization and Control

Machine Learning

Statistics Theory

In general, the clustering problem is NP-hard, and global optimality cannot be established for non-trivial instances. For high-dimensional data, distance-based methods for clustering or classification face an additional difficulty, the unreliability of distances in very high-dimensional spaces. We propose a distance-based iterative method for clustering data in very high-dimensional space, using the $\ell_1$-metric that is less sensitive to high dimensionality than the Euclid...

Find SimilarView on arXiv

Bayesian Repulsive Gaussian Mixture Model

March 27, 2017

85% Match

Fangzheng Xie, Yanxun Xu

Methodology

We develop a general class of Bayesian repulsive Gaussian mixture models that encourage well-separated clusters, aiming at reducing potentially redundant components produced by independent priors for locations (such as the Dirichlet process). The asymptotic results for the posterior distribution of the proposed models are derived, including posterior consistency and posterior contraction rate in the context of nonparametric density estimation. More importantly, we show that c...

Find SimilarView on arXiv

Approximate Inference via Clustering

November 28, 2021

85% Match

Qianqian Song

Machine Learning

In recent years, large-scale Bayesian learning draws a great deal of attention. However, in big-data era, the amount of data we face is growing much faster than our ability to deal with it. Fortunately, it is observed that large-scale datasets usually own rich internal structure and is somewhat redundant. In this paper, we attempt to simplify the Bayesian posterior via exploiting this structure. Specifically, we restrict our interest to the so-called well-clustered datasets a...

Find SimilarView on arXiv

Hierarchical clustering with discrete latent variable models and the integrated classification likelihood

February 26, 2020

85% Match

Etienne Côme, Nicolas Jouvin, ... , Bouveyron Charles

Computation

Finding a set of nested partitions of a dataset is useful to uncover relevant structure at different scales, and is often dealt with a data-dependent methodology. In this paper, we introduce a general two-step methodology for model-based hierarchical clustering. Considering the integrated classification likelihood criterion as an objective function, this work applies to every discrete latent variable models (DLVMs) where this quantity is tractable. The first step of the metho...

Find SimilarView on arXiv