ID: 2207.13984

Model based clustering of multinomial count data

July 28, 2022

View on ArXiv

Similar papers 2

Choosing the number of clusters in a finite mixture model using an exact Integrated Completed Likelihood criterion

November 16, 2014

89% Match
Marco Bertoletti, Nial Friel, Riccardo Rastelli
Computation
Methodology

The integrated completed likelihood (ICL) criterion has proven to be a very popular approach in model-based clustering through automatically choosing the number of clusters in a mixture model. This approach effectively maximises the complete data likelihood, thereby including the allocation of observations to clusters in the model selection criterion. However for practical implementation one needs to introduce an approximation in order to estimate the ICL. Our contribution he...

Find SimilarView on arXiv

Overfitting Bayesian Mixtures of Factor Analyzers with an Unknown Number of Components

January 17, 2017

88% Match
Panagiotis Papastamoulis
Methodology

Recent advances on overfitting Bayesian mixture models provide a solid and straightforward approach for inferring the underlying number of clusters and model parameters in heterogeneous datasets. The applicability of such a framework in clustering correlated high dimensional data is demonstrated. For this purpose an overfitting mixture of factor analyzers is introduced, assuming that the number of factors is fixed. A Markov chain Monte Carlo (MCMC) sampler combined with a pri...

Find SimilarView on arXiv

A parsimonious family of multivariate Poisson-lognormal distributions for clustering multivariate count data

April 15, 2020

88% Match
Sanjeena Subedi, Ryan Browne
Computation

Multivariate count data are commonly encountered through high-throughput sequencing technologies in bioinformatics, text mining, or in sports analytics. Although the Poisson distribution seems a natural fit to these count data, its multivariate extension is computationally expensive.In most cases mutual independence among the variables is assumed, however this fails to take into account the correlation among the variables usually observed in the data. Recently, mixtures of mu...

Find SimilarView on arXiv

Repulsive Mixtures

April 24, 2012

88% Match
Francesca Petralia, Vinayak Rao, David B. Dunson
Methodology

Discrete mixture models are routinely used for density estimation and clustering. While conducting inferences on the cluster-specific parameters, current frequentist and Bayesian methods often encounter problems when clusters are placed too close together to be scientifically meaningful. Current Bayesian practice generates component-specific parameters independently from a common prior, which tends to favor similar components and often leads to substantial probability assigne...

Find SimilarView on arXiv

Distributed Bayesian clustering using finite mixture of mixtures

March 31, 2020

88% Match
Hanyu Song, Yingjian Wang, David B. Dunson
Computation
Methodology

In many modern applications, there is interest in analyzing enormous data sets that cannot be easily moved across computers or loaded into memory on a single computer. In such settings, it is very common to be interested in clustering. Existing distributed clustering algorithms are mostly distance or density based without a likelihood specification, precluding the possibility of formal statistical inference. Model-based clustering allows statistical inference, yet research on...

Find SimilarView on arXiv

Minimum Message Length Clustering Using Gibbs Sampling

January 16, 2013

88% Match
Ian Davidson
Machine Learning
Machine Learning

The K-Mean and EM algorithms are popular in clustering and mixture modeling, due to their simplicity and ease of implementation. However, they have several significant limitations. Both coverage to a local optimum of their respective objective functions (ignoring the uncertainty in the model space), require the apriori specification of the number of classes/clsuters, and are inconsistent. In this work we overcome these limitations by using the Minimum Message Length (MML) pri...

Find SimilarView on arXiv

Inferring Hierarchical Mixture Structures: A Bayesian Nonparametric Approach

May 13, 2019

88% Match
Weipeng Huang, Nishma Laitonjam, ... , Hurley Neil
Machine Learning
Artificial Intelligence
Machine Learning

This paper focuses on the problem of hierarchical non-overlapping clustering of a dataset. In such a clustering, each data item is associated with exactly one leaf node and each internal node is associated with all the data items stored in the sub-tree beneath it, so that each level of the hierarchy corresponds to a partition of the dataset. We develop a novel Bayesian nonparametric method combining the nested Chinese Restaurant Process (nCRP) and the Hierarchical Dirichlet P...

Find SimilarView on arXiv

A Multivariate Poisson-Log Normal Mixture Model for Clustering Transcriptome Sequencing Data

November 30, 2017

88% Match
Anjali Silva, Steven J. Rothstein, ... , Subedi Sanjeena
Methodology
Quantitative Methods
Computation

High-dimensional data of discrete and skewed nature is commonly encountered in high-throughput sequencing studies. Analyzing the network itself or the interplay between genes in this type of data continues to present many challenges. As data visualization techniques become cumbersome for higher dimensions and unconvincing when there is no clear separation between homogeneous subgroups within the data, cluster analysis provides an intuitive alternative. The aim of applying mix...

Find SimilarView on arXiv

A multiscale Bayesian nonparametric framework for partial hierarchical clustering

June 28, 2024

88% Match
Lorenzo Schiavon, Mattia Stival
Methodology

In recent years, there has been a growing demand to discern clusters of subjects in datasets characterized by a large set of features. Often, these clusters may be highly variable in size and present partial hierarchical structures. In this context, model-based clustering approaches with nonparametric priors are gaining attention in the literature due to their flexibility and adaptability to new data. However, current approaches still face challenges in recognizing hierarchic...

Find SimilarView on arXiv

Greedy clustering of count data through a mixture of multinomial PCA

September 2, 2019

88% Match
Nicolas 1 and 2 Jouvin, Pierre Latouche, Charles Bouveyron, ... , Livartowski Alain
Methodology

Count data is becoming more and more ubiquitous in a wide range of applications, with datasets growing both in size and in dimension. In this context, an increasing amount of work is dedicated to the construction of statistical models directly accounting for the discrete nature of the data. Moreover, it has been shown that integrating dimension reduction to clustering can drastically improve performance and stability. In this paper, we rely on the mixture of multinomial PCA, ...

Find SimilarView on arXiv