May 11, 2020
Mixtures of multivariate normal inverse Gaussian (MNIG) distributions can be used to cluster data that exhibit features such as skewness and heavy tails. However, for cluster analysis, using a traditional finite mixture model framework, either the number of components needs to be known $a$-$priori$ or needs to be estimated $a$-$posteriori$ using some model selection criterion after deriving results for a range of possible number of components. However, different model selecti...
October 1, 2013
The Dirichlet process mixture model and more general mixtures based on discrete random probability measures have been shown to be flexible and accurate models for density estimation and clustering. The goal of this paper is to illustrate the use of normalized random measures as mixing measures in nonparametric hierarchical mixture models and point out how possible computational issues can be successfully addressed. To this end, we first provide a concise and accessible introd...
July 21, 2022
We study the sparse high-dimensional Gaussian mixture model when the number of clusters is allowed to grow with the sample size. A minimax lower bound for parameter estimation is established, and we show that a constrained maximum likelihood estimator achieves the minimax lower bound. However, this optimization-based estimator is computationally intractable because the objective function is highly nonconvex and the feasible set involves discrete structures. To address the com...
May 2, 2009
In this paper, we provide an explicit probability distribution for classification purposes. It is derived from the Bayesian nonparametric mixture of Dirichlet process model, but with suitable modifications which remove unsuitable aspects of the classification based on this model. The resulting approach then more closely resembles a classical hierarchical grouping rule in that it depends on sums of squares of neighboring values. The proposed probability model for classificatio...
July 2, 2024
The intricacies inherent in contemporary real datasets demand more advanced statistical models to effectively address complex challenges. In this article we delve into problems related to identifying clusters across related groups, when additional covariate information is available. We formulate a novel Bayesian nonparametric approach based on mixture models, integrating ideas from the hierarchical Dirichlet process and "single-atoms" dependent Dirichlet process. The proposed...
June 2, 2019
Recent work on overfitting Bayesian mixtures of distributions offers a powerful framework for clustering multivariate data using a latent Gaussian model which resembles the factor analysis model. The flexibility provided by overfitting mixture models yields a simple and efficient way in order to estimate the unknown number of clusters and model parameters by Markov chain Monte Carlo (MCMC) sampling. The present study extends this approach by considering a set of eight paramet...
February 22, 2015
A natural Bayesian approach for mixture models with an unknown number of components is to take the usual finite mixture model with Dirichlet weights, and put a prior on the number of components---that is, to use a mixture of finite mixtures (MFM). While inference in MFMs can be done with methods such as reversible jump Markov chain Monte Carlo, it is much more common to use Dirichlet process mixture (DPM) models because of the relative ease and generality with which DPM sampl...
May 19, 2014
This paper deals with Bayesian inference of a mixture of Gaussian distributions. A novel formulation of the mixture model is introduced, which includes the prior constraint that each Gaussian component is always assigned a minimal number of data points. This enables noninformative improper priors such as the Jeffreys prior to be placed on the component parameters. We demonstrate difficulties involved in specifying a prior for the standard Gaussian mixture model, and show how ...
August 4, 2008
This paper has been withdrawn. With the advancement of statistical theory and computing power, data sets are providing a greater amount of insight into the problems of today. Statisticians have an ever increasing number of tools to attack these problems, some of which can be implemented in the area of mixture modeling. There is a great deal of literature on mixture models and this work attempts to provide a general overview of the subject, including the discussion of relevant...
May 16, 2007
When modeling the distribution of a set of data by a mixture of Gaussians, there are two possibilities: i) the classical one is using a set of parameters which are the proportions, the means and the variances; ii) the second is to consider the proportions as the probabilities of a discrete valued hidden variable. In the first case a usual prior distribution for the proportions is the Dirichlet which accounts for the fact that they have to sum up to one. In the second case, to...