September 20, 2018
Similar papers 2
May 29, 2010
This paper provides a theoretical explanation on the clustering aspect of nonnegative matrix factorization (NMF). We prove that even without imposing orthogonality nor sparsity constraint on the basis and/or coefficient matrix, NMF still can give clustering results, thus providing a theoretical support for many works, e.g., Xu et al. [1] and Kim et al. [2], that show the superiority of the standard NMF as a clustering method.
June 28, 2024
In recent years, there has been a growing demand to discern clusters of subjects in datasets characterized by a large set of features. Often, these clusters may be highly variable in size and present partial hierarchical structures. In this context, model-based clustering approaches with nonparametric priors are gaining attention in the literature due to their flexibility and adaptability to new data. However, current approaches still face challenges in recognizing hierarchic...
March 31, 2020
In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering. Existing distributed clustering algorithms are mostly distance or density based without a likelihood specification, precluding the possibility of formal statistical inference. Model-based clustering allows statistical inference, yet research on...
February 23, 2015
The use of a finite mixture of normal distributions in model-based clustering allows to capture non-Gaussian data clusters. However, identifying the clusters from the normal components is challenging and in general either achieved by imposing constraints on the model or by using post-processing procedures. Within the Bayesian framework we propose a different approach based on sparse finite mixtures to achieve identifiability. We specify a hierarchical prior where the hyperpar...
December 29, 2013
We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC criteria but also extends the conventional criteria in a way that it can be applicable also to a sequence of sparse multinomial observations, where even within a same cluster, the number of multinomial trials may be different for different...
June 20, 2016
In this work, we empirically explore the question: how can we assess the quality of samples from some target distribution? We assume that the samples are provided by some valid Monte Carlo procedure, so we are guaranteed that the collection of samples will asymptotically approximate the true distribution. Most current evaluation approaches focus on two questions: (1) Has the chain mixed, that is, is it sampling from the distribution? and (2) How independent are the samples (a...
June 14, 2011
A key problem in statistical modeling is model selection, how to choose a model at an appropriate level of complexity. This problem appears in many settings, most prominently in choosing the number ofclusters in mixture models or the number of factors in factor analysis. In this tutorial we describe Bayesian nonparametric methods, a class of methods that side-steps this issue by allowing the data to determine the complexity of the model. This tutorial is a high-level introduc...
September 22, 2022
The model described in this paper belongs to the family of non-negative matrix factorization methods designed for data representation and dimension reduction. In addition to preserving the data positivity property, it aims also to preserve the structure of data during matrix factorization. The idea is to add, to the NMF cost function, a penalty term to impose a scale relationship between the pairwise similarity matrices of the original and transformed data points. The solutio...
June 2, 2019
Recent work on overfitting Bayesian mixtures of distributions offers a powerful framework for clustering multivariate data using a latent Gaussian model which resembles the factor analysis model. The flexibility provided by overfitting mixture models yields a simple and efficient way in order to estimate the unknown number of clusters and model parameters by Markov chain Monte Carlo (MCMC) sampling. The present study extends this approach by considering a set of eight paramet...
May 23, 2022
In this paper, we introduce a probabilistic model for learning nonnegative matrix factorization (NMF) that is commonly used for predicting missing values and finding hidden patterns in the data, in which the matrix factors are latent variables associated with each data dimension. The nonnegativity constraint for the latent factors is handled by choosing priors with support on the nonnegative subspace. Bayesian inference procedure based on Gibbs sampling is employed. We evalua...