ID: 1809.07850

Optimal Bayesian clustering using non-negative matrix factorization

September 20, 2018

View on ArXiv

Similar papers 4

Comparative Study of Inference Methods for Bayesian Nonnegative Matrix Factorisation

July 13, 2017

87% Match
Thomas Brouwer, Jes Frellsen, Pietro Lió
Machine Learning
Machine Learning

In this paper, we study the trade-offs of different inference approaches for Bayesian matrix factorisation methods, which are commonly used for predicting missing values, and for finding patterns in the data. In particular, we consider Bayesian nonnegative variants of matrix factorisation and tri-factorisation, and compare non-probabilistic inference, Gibbs sampling, variational Bayesian inference, and a maximum-a-posteriori approach. The variational approach is new for the B...

Find SimilarView on arXiv

Bayesian Distance Clustering

October 19, 2018

87% Match
Leo L Duan, David B Dunson
Machine Learning
Machine Learning

Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some inf...

Find SimilarView on arXiv

Approximate Inference via Clustering

November 28, 2021

87% Match
Qianqian Song
Machine Learning
Machine Learning

In recent years, large-scale Bayesian learning draws a great deal of attention. However, in big-data era, the amount of data we face is growing much faster than our ability to deal with it. Fortunately, it is observed that large-scale datasets usually own rich internal structure and is somewhat redundant. In this paper, we attempt to simplify the Bayesian posterior via exploiting this structure. Specifically, we restrict our interest to the so-called well-clustered datasets a...

Find SimilarView on arXiv

Model-based Clustering

July 5, 2018

87% Match
Bettina Grün
Methodology

Mixture models extend the toolbox of clustering methods available to the data analyst. They allow for an explicit definition of the cluster shapes and structure within a probabilistic framework and exploit estimation and inference techniques available for statistical models in general. In this chapter an introduction to cluster analysis is provided, model-based clustering is related to standard heuristic clustering methods and an overview on different ways to specify the clus...

Find SimilarView on arXiv

A generalized Bayes framework for probabilistic clustering

June 9, 2020

87% Match
Tommaso Rigon, Amy H. Herring, David B. Dunson
Methodology
Machine Learning

Loss-based clustering methods, such as k-means and its variants, are standard tools for finding groups in data. However, the lack of quantification of uncertainty in the estimated clusters is a disadvantage. Model-based clustering based on mixture models provides an alternative, but such methods face computational problems and large sensitivity to the choice of kernel. This article proposes a generalized Bayes framework that bridges between these two paradigms through the use...

Find SimilarView on arXiv

Statistically Optimal K-means Clustering via Nonnegative Low-rank Semidefinite Programming

May 29, 2023

87% Match
Yubo Zhuang, Xiaohui Chen, ... , Zhang Richard Y.
Machine Learning
Machine Learning
Optimization and Control

$K$-means clustering is a widely used machine learning method for identifying patterns in large datasets. Semidefinite programming (SDP) relaxations have recently been proposed for solving the $K$-means optimization problem that enjoy strong statistical optimality guarantees, but the prohibitive cost of implementing an SDP solver renders these guarantees inaccessible to practical datasets. By contrast, nonnegative matrix factorization (NMF) is a simple clustering algorithm th...

Find SimilarView on arXiv

An Efficient Model Selection for Gaussian Mixture Model in a Bayesian Framework

July 3, 2013

87% Match
Ji Won Yoon
Machine Learning
Machine Learning

In order to cluster or partition data, we often use Expectation-and-Maximization (EM) or Variational approximation with a Gaussian Mixture Model (GMM), which is a parametric probability density function represented as a weighted sum of $\hat{K}$ Gaussian component densities. However, model selection to find underlying $\hat{K}$ is one of the key concerns in GMM clustering, since we can obtain the desired clusters only when $\hat{K}$ is known. In this paper, we propose a new m...

Find SimilarView on arXiv

Rethinking Symmetric Matrix Factorization: A More General and Better Clustering Perspective

September 6, 2022

87% Match
Mengyuan Zhang, Kai Liu
Machine Learning
Artificial Intelligence

Nonnegative matrix factorization (NMF) is widely used for clustering with strong interpretability. Among general NMF problems, symmetric NMF is a special one that plays an important role in graph clustering where each element measures the similarity between data points. Most existing symmetric NMF algorithms require factor matrices to be nonnegative, and only focus on minimizing the gap between similarity matrix and its approximation for clustering, without giving a considera...

Find SimilarView on arXiv

Sparse Bayesian Unsupervised Learning

January 30, 2014

87% Match
Stephane Gaiffas, Bertrand Michel
Machine Learning

This paper is about variable selection, clustering and estimation in an unsupervised high-dimensional setting. Our approach is based on fitting constrained Gaussian mixture models, where we learn the number of clusters $K$ and the set of relevant variables $S$ using a generalized Bayesian posterior with a sparsity inducing prior. We prove a sparsity oracle inequality which shows that this procedure selects the optimal parameters $K$ and $S$. This procedure is implemented usin...

Find SimilarView on arXiv

Escaping the curse of dimensionality in Bayesian model based clustering

June 4, 2020

87% Match
Noirrit Kiran Chandra, Antonio Canale, David B. Dunson
Methodology
Computation

Bayesian mixture models are widely used for clustering of high-dimensional data with appropriate uncertainty quantification. However, as the dimension of the observations increases, posterior inference often tends to favor too many or too few clusters. This article explains this behavior by studying the random partition posterior in a non-standard setting with a fixed sample size and increasing data dimensionality. We provide conditions under which the finite sample posterior...

Find SimilarView on arXiv