Bayesian cluster analysis: Point estimation and credible balls

May 13, 2015

Mean-field theory of Bayesian clustering

September 6, 2017

88% Match

Alexander Mozeika, Anthony CC Coolen

Disordered Systems and Neura...

Data Analysis, Statistics an...

We show that model-based Bayesian clustering, the probabilistically most systematic approach to the partitioning of data, can be mapped into a statistical physics problem for a gas of particles, and as a result becomes amenable to a detailed quantitative analysis. A central role in the resulting statistical physics framework is played by an entropy function. We demonstrate that there is a relevant parameter regime where mean-field analysis of this function is exact, and that,...

Find SimilarView on arXiv

Bayesian Level-Set Clustering

March 7, 2024

88% Match

David Buch, Miheer Dewaskar, David B. Dunson

Methodology

Broadly, the goal when clustering data is to separate observations into meaningful subgroups. The rich variety of methods for clustering reflects the fact that the relevant notion of meaningful clusters varies across applications. The classical Bayesian approach clusters observations by their association with components of a mixture model; the choice in class of components allows flexibility to capture a range of meaningful cluster notions. However, in practice the range is s...

Find SimilarView on arXiv

Clustering - What Both Theoreticians and Practitioners are Doing Wrong

May 22, 2018

88% Match

Shai Ben-David

Machine Learning

Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowa- days. However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks, in particular of clustering, is very rudimentary. This note focuses on clustering. I claim that the most signif- icant challenge for clustering is model selection. In contrast with other common computa...

Find SimilarView on arXiv

A Tutorial on Bayesian Nonparametric Models

June 14, 2011

87% Match

Samuel J. Gershman, David M. Blei

Machine Learning

Methodology

A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number ofclusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduc...

Find SimilarView on arXiv

Probabilistic community detection with unknown number of communities

February 25, 2016

87% Match

Junxian Geng, Anirban Bhattacharya, Debdeep Pati

Methodology

Statistics Theory

A fundamental problem in network analysis is clustering the nodes into groups which share a similar connectivity pattern. Existing algorithms for community detection assume the knowledge of the number of clusters or estimate it a priori using various selection criteria and subsequently estimate the community structure. Ignoring the uncertainty in the first stage may lead to erroneous clustering, particularly when the community structure is vague. We instead propose a coherent...

Find SimilarView on arXiv

Robust Bayesian Model Selection for Variable Clustering with the Gaussian Graphical Model

June 15, 2018

87% Match

Daniel Andrade, Akiko Takeda, Kenji Fukumizu

Applications

Computation

Machine Learning

Variable clustering is important for explanatory analysis. However, only few dedicated methods for variable clustering with the Gaussian graphical model have been proposed. Even more severe, small insignificant partial correlations due to noise can dramatically change the clustering result when evaluating for example with the Bayesian Information Criteria (BIC). In this work, we try to address this issue by proposing a Bayesian model that accounts for negligible small, but no...

Find SimilarView on arXiv

Nonparametric Bayesian Aggregation for Massive Data

August 18, 2015

87% Match

Zuofeng Shang, Botao Hao, Guang Cheng

Statistics Theory

We develop a set of scalable Bayesian inference procedures for a general class of nonparametric regression models. Specifically, nonparametric Bayesian inferences are separately performed on each subset randomly split from a massive dataset, and then the obtained local results are aggregated into global counterparts. This aggregation step is explicit without involving any additional computation cost. By a careful partition, we show that our aggregated inference results obtain...

Find SimilarView on arXiv

How many clusters? An information theoretic perspective

March 4, 2003

87% Match

Susanne Still, William Bialek

Data Analysis, Statistics an...

General Physics

Clustering provides a common means of identifying structure in complex data, and there is renewed interest in clustering as a tool for the analysis of large data sets in many fields. A natural question is how many clusters are appropriate for the description of a given system. Traditional approaches to this problem are based either on a framework in which clusters of a particular shape are assumed as a model of the system or on a two-step procedure in which a clustering crite...

Find SimilarView on arXiv

Approximate Inference via Clustering

November 28, 2021

87% Match

Qianqian Song

Machine Learning

In recent years, large-scale Bayesian learning draws a great deal of attention. However, in big-data era, the amount of data we face is growing much faster than our ability to deal with it. Fortunately, it is observed that large-scale datasets usually own rich internal structure and is somewhat redundant. In this paper, we attempt to simplify the Bayesian posterior via exploiting this structure. Specifically, we restrict our interest to the so-called well-clustered datasets a...

Find SimilarView on arXiv

Clustering with Statistical Error Control

February 8, 2017

87% Match

Michael Vogt, Matthias Schmid

Statistics Theory

This paper presents a clustering approach that allows for rigorous statistical error control similar to a statistical test. We develop estimators for both the unknown number of clusters and the clusters themselves. The estimators depend on a tuning parameter alpha which is similar to the significance level of a statistical hypothesis test. By choosing alpha, one can control the probability of overestimating the true number of clusters, while the probability of underestimation...

Find SimilarView on arXiv