April 8, 2013
The Dirichlet process (DP) is a fundamental mathematical tool for Bayesian nonparametric modeling, and is widely used in tasks such as density estimation, natural language processing, and time series modeling. Although MCMC inference methods for the DP often provide a gold standard in terms asymptotic accuracy, they can be computationally expensive and are not obviously parallelizable. We propose a reparameterization of the Dirichlet process that induces conditional independe...
May 15, 2017
We consider mixture models where location parameters are a priori encouraged to be well separated. We explore a class of determinantal point process (DPP) mixture models, which provide the desired notion of separation or repulsion. Instead of using the rather restrictive case where analytical results are available, we adopt a spectral representation from which approximations to the DPP intensity functions can be readily computed. For the sake of concreteness the presentation ...
July 12, 2021
Clustering in high-dimensions poses many statistical challenges. While traditional distance-based clustering methods are computationally feasible, they lack probabilistic interpretation and rely on heuristics for estimation of the number of clusters. On the other hand, probabilistic model-based clustering techniques often fail to scale and devising algorithms that are able to effectively explore the posterior space is an open problem. Based on recent developments in Bayesian ...
August 1, 2023
Finite mixture models are flexible methods that are commonly used for model-based clustering. A recent focus in the model-based clustering literature is to highlight the difference between the number of components in a mixture model and the number of clusters. The number of clusters is more relevant from a practical stand point, but to date, the focus of prior distribution formulation has been on the number of components. In light of this, we develop a finite mixture methodol...
January 22, 2015
In this paper a simple procedure to deal with label switching when exploring complex posterior distributions by MCMC algorithms is proposed. Although it cannot be generalized to any situation, it may be handy in many applications because of its simplicity and very low computational burden. A possible area where it proves to be useful is when deriving a sample for the posterior distribution arising from finite mixture models when no simple or rational ordering between the comp...
April 22, 2019
Mixture models are one of the most widely used statistical tools when dealing with data from heterogeneous populations. This paper considers the long-standing debate over finite mixture and infinite mixtures and brings the two modelling strategies together, by showing that a finite mixture is simply a realization of a point process. Following a Bayesian nonparametric perspective, we introduce a new class of prior: the Normalized Independent Point Processes. We investigate the...
October 2, 2023
This paper proposes a new nonparametric Bayesian bootstrap for a mixture model, by developing the traditional Bayesian bootstrap. We first reinterpret the Bayesian bootstrap, which uses the P\'olya-urn scheme, as a gradient ascent algorithm which associated one-step solver. The key then is to use the same basic mechanism as the Bayesian bootstrap with the switch from a point mass kernel to a continuous kernel. Just as the Bayesian bootstrap works solely from the empirical dis...
April 23, 2024
There is increasing interest to develop Bayesian inferential algorithms for point process models with intractable likelihoods. A purpose of this paper is to illustrate the utility of using simulation based strategies, including approximate Bayesian computation (ABC) and Markov chain Monte Carlo (MCMC) methods for this task. Shirota and Gelfand (2017) proposed an extended version of an ABC approach for repulsive spatial point processes, including the Strauss point process and ...
October 19, 2018
Model-based clustering is widely-used in a variety of application areas. However, fundamental concerns remain about robustness. In particular, results can be sensitive to the choice of kernel representing the within-cluster data density. Leveraging on properties of pairwise differences between data points, we propose a class of Bayesian distance clustering methods, which rely on modeling the likelihood of the pairwise distances in place of the original data. Although some inf...
August 20, 2021
Clustering has become a core technology in machine learning, largely due to its application in the field of unsupervised learning, clustering, classification, and density estimation. A frequentist approach exists to hand clustering based on mixture model which is known as the EM algorithm where the parameters of the mixture model are usually estimated into a maximum likelihood estimation framework. Bayesian approach for finite and infinite Gaussian mixture model generates poi...