High-Dimensional Data Clustering

Subspace clustering of high-dimensional data: a predictive approach

March 5, 2012

88% Match

Brian McWilliams, Giovanni Montana

Machine Learning

In several application domains, high-dimensional observations are collected and then analysed in search for naturally occurring data clusters which might provide further insights about the nature of the problem. In this paper we describe a new approach for partitioning such high-dimensional data. Our assumption is that, within each cluster, the data can be approximated well by a linear subspace estimated by means of a principal component analysis (PCA). The proposed algorithm...

Find SimilarView on arXiv

Variational Inference and Sparsity in High-Dimensional Deep Gaussian Mixture Models

May 4, 2021

88% Match

Lucas Kock, Nadja Klein, David J. Nott

Methodology

Gaussian mixture models are a popular tool for model-based clustering, and mixtures of factor analyzers are Gaussian mixture models having parsimonious factor covariance structure for mixture components. There are several recent extensions of mixture of factor analyzers to deep mixtures, where the Gaussian model for the latent factors is replaced by a mixture of factor analyzers. This construction can be iterated to obtain a model with many layers. These deep models are chall...

Find SimilarView on arXiv

Automatic Parameter Selection for Non-Redundant Clustering

December 19, 2023

88% Match

Collin Leiber, Dominik Mautz, ... , Böhm Christian

Machine Learning

Artificial Intelligence

High-dimensional datasets often contain multiple meaningful clusterings in different subspaces. For example, objects can be clustered either by color, weight, or size, revealing different interpretations of the given dataset. A variety of approaches are able to identify such non-redundant clusterings. However, most of these methods require the user to specify the expected number of subspaces and clusters for each subspace. Stating these values is a non-trivial problem and usu...

Find SimilarView on arXiv

Deep Density-based Image Clustering

December 11, 2018

88% Match

Yazhou Ren, Ni Wang, ... , Xu Zenglin

Machine Learning

Computer Vision and Pattern ...

Machine Learning

Recently, deep clustering, which is able to perform feature learning that favors clustering tasks via deep neural networks, has achieved remarkable performance in image clustering applications. However, the existing deep clustering algorithms generally need the number of clusters in advance, which is usually unknown in real-world tasks. In addition, the initial cluster centers in the learned feature space are generated by $k$-means. This only works well on spherical clusters ...

Find SimilarView on arXiv

On perfect clustering of high dimension, low sample size data

December 29, 2016

88% Match

Soham Sarkar, Anil K. Ghosh

Methodology

Popular clustering algorithms based on usual distance functions (e.g., Euclidean distance) often suffer in high dimension, low sample size (HDLSS) situations, where concentration of pairwise distances has adverse effects on their performance. In this article, we use a dissimilarity measure based on the data cloud, called MADD, which takes care of this problem. MADD uses the distance concentration phenomenon to its advantage, and as a result, clustering algorithms based on MAD...

Find SimilarView on arXiv

Model-based clustering in very high dimensions via adaptive projections

February 22, 2019

88% Match

Bernd Taschler, Frank Dondelinger, Sach Mukherjee

Machine Learning

Mixture models are a standard approach to dealing with heterogeneous data with non-i.i.d. structure. However, when the dimension $p$ is large relative to sample size $n$ and where either or both of means and covariances/graphical models may differ between the latent groups, mixture models face statistical and computational difficulties and currently available methods cannot realistically go beyond $p \! \sim \! 10^4$ or so. We propose an approach called Model-based Clustering...

Find SimilarView on arXiv

Flexible Clustering for High-Dimensional Data via Mixtures of Joint Generalized Hyperbolic Models

May 9, 2017

88% Match

Yang Tang, Ryan P. Browne, Paul D. McNicholas

Methodology

Computation

A mixture of joint generalized hyperbolic distributions (MJGHD) is introduced for asymmetric clustering for high-dimensional data. The MJGHD approach takes into account the cluster-specific subspace, thereby limiting the number of parameters to estimate while also facilitating visualization of results. Identifiability is discussed, and a multi-cycle ECM algorithm is outlined for parameter estimation. The MJGHD approach is illustrated on two real data sets, where the Bayesian ...

Find SimilarView on arXiv

Model-based clustering of multivariate binary data with dimension reduction

June 14, 2014

88% Match

Michio Yamamoto, Kenichi Hayashi

Methodology

Clustering methods with dimension reduction have been receiving considerable wide interest in statistics lately and a lot of methods to simultaneously perform clustering and dimension reduction have been proposed. This work presents a novel procedure for simultaneously determining the optimal cluster structure for multivariate binary data and the subspace to represent that cluster structure. The method is based on a finite mixture model of multivariate Bernoulli distributions...

Find SimilarView on arXiv

Bridging Distribution Learning and Image Clustering in High-dimensional Space

August 30, 2023

88% Match

Guanfang Dong, Chenqiu Zhao, Anup Basu

Machine Learning

Computer Vision and Pattern ...

Distribution learning focuses on learning the probability density function from a set of data samples. In contrast, clustering aims to group similar objects together in an unsupervised manner. Usually, these two tasks are considered unrelated. However, the relationship between the two may be indirectly correlated, with Gaussian Mixture Models (GMM) acting as a bridge. In this paper, we focus on exploring the correlation between distribution learning and clustering, with the m...

Find SimilarView on arXiv

A Fast Algorithm for Clustering High Dimensional Feature Vectors

November 2, 2018

88% Match

Shahina Rahman, Valen E. Johnson

Machine Learning

We propose an algorithm for clustering high dimensional data. If $P$ features for $N$ objects are represented in an $N\times P$ matrix ${\bf X}$, where $N\ll P$, the method is based on exploiting the cluster-dependent structure of the $N\times N$ matrix ${\bf XX}^T$. Computational burden thus depends primarily on $N$, the number of objects to be clustered, rather than $P$, the number of features that are measured. This makes the method particularly useful in high dimensional ...

Find SimilarView on arXiv