High-Dimensional Data Clustering

Subspace Clustering with the Multivariate-t Distribution

June 27, 2017

92% Match

Angelina Pesevski, Brian C. Franczak, Paul D. McNicholas

Methodology

Clustering procedures suitable for the analysis of very high-dimensional data are needed for many modern data sets. In model-based clustering, a method called high-dimensional data clustering (HDDC) uses a family of Gaussian mixture models for clustering. HDDC is based on the idea that high-dimensional data usually exists in lower-dimensional subspaces; as such, an intrinsic dimension for each sub-population of the observed data can be estimated and cluster analysis can be pe...

Find SimilarView on arXiv

Dimensionality's Blessing: Clustering Images by Underlying Distribution

April 8, 2018

91% Match

Wen-Yan Lin, Siying Liu, ... , Matsushita Yasuyuki

Computer Vision and Pattern ...

Many high dimensional vector distances tend to a constant. This is typically considered a negative "contrast-loss" phenomenon that hinders clustering and other machine learning techniques. We reinterpret "contrast-loss" as a blessing. Re-deriving "contrast-loss" using the law of large numbers, we show it results in a distribution's instances concentrating on a thin "hyper-shell". The hollow center means apparently chaotically overlapping distributions are actually intrinsical...

Find SimilarView on arXiv

Hierarchical mixtures of Gaussians for combined dimensionality reduction and clustering

June 10, 2022

90% Match

Sacha Sokoloski, Philipp Berens

Machine Learning

To avoid the curse of dimensionality, a common approach to clustering high-dimensional data is to first project the data into a space of reduced dimension, and then cluster the projected data. Although effective, this two-stage approach prevents joint optimization of the dimensionality-reduction and clustering models, and obscures how well the complete model describes the data. Here, we show how a family of such two-stage models can be combined into a single, hierarchical mod...

Find SimilarView on arXiv

Clustering for high-dimension, low-sample size data using distance vectors

December 12, 2013

90% Match

Yoshikazu Terada

Machine Learning

In high-dimension, low-sample size (HDLSS) data, it is not always true that closeness of two objects reflects a hidden cluster structure. We point out the important fact that it is not the closeness, but the "values" of distance that contain information of the cluster structure in high-dimensional space. Based on this fact, we propose an efficient and simple clustering approach, called distance vector clustering, for HDLSS data. Under the assumptions given in the work of Hall...

Find SimilarView on arXiv

Gaussian Mixture Models with Component Means Constrained in Pre-selected Subspaces

August 26, 2015

90% Match

Mu Qiao, Jia Li

Machine Learning

We investigate a Gaussian mixture model (GMM) with component means constrained in a pre-selected subspace. Applications to classification and clustering are explored. An EM-type estimation algorithm is derived. We prove that the subspace containing the component means of a GMM with a common covariance matrix also contains the modes of the density and the class means. This motivates us to find a subspace by applying weighted principal component analysis to the modes of a kerne...

Find SimilarView on arXiv

Dimension reduction for model-based clustering

August 7, 2015

90% Match

Luca Scrucca

Methodology

Machine Learning

We introduce a dimension reduction method for visualizing the clustering structure obtained from a finite mixture of Gaussian densities. Information on the dimension reduction subspace is obtained from the variation on group means and, depending on the estimated mixture model, on the variation on group covariances. The proposed method aims at reducing the dimensionality by identifying a set of linear combinations, ordered by importance as quantified by the associated eigenval...

Find SimilarView on arXiv

Nonparametric Density Estimation for High-Dimensional Data - Algorithms and Applications

March 30, 2019

90% Match

Zhipeng Wang, David W. Scott

Machine Learning

Computation

Density Estimation is one of the central areas of statistics whose purpose is to estimate the probability density function underlying the observed data. It serves as a building block for many tasks in statistical inference, visualization, and machine learning. Density Estimation is widely adopted in the domain of unsupervised learning especially for the application of clustering. As big data become pervasive in almost every area of data sciences, analyzing high-dimensional da...

Find SimilarView on arXiv

KDD-SC: Subspace Clustering Extensions for Knowledge Discovery Frameworks

July 15, 2014

89% Match

Stephan Günnemann, Hardy Kremer, ... , Seidl Thomas

Databases

Analyzing high dimensional data is a challenging task. For these data it is known that traditional clustering algorithms fail to detect meaningful patterns. As a solution, subspace clustering techniques have been introduced. They analyze arbitrary subspace projections of the data to detect clustering structures. In this paper, we present our subspace clustering extension for KDD frameworks, termed KDD-SC. In contrast to existing subspace clustering toolkits, our solution ne...

Find SimilarView on arXiv

Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures

June 9, 2014

89% Match

Martin Azizyan, Aarti Singh, Larry Wasserman

Statistics Theory

Machine Learning

Statistics Theory

We consider the problem of clustering data points in high dimensions, i.e. when the number of data points may be much smaller than the number of dimensions. Specifically, we consider a Gaussian mixture model (GMM) with non-spherical Gaussian components, where the clusters are distinguished by only a few relevant dimensions. The method we propose is a combination of a recent approach for learning parameters of a Gaussian mixture model and sparse linear discriminant analysis (L...

Find SimilarView on arXiv

Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

January 12, 2011

89% Match

Charles Bouveyron, Camille Brunet

Methodology

Applications

Computation

Machine Learning

Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a famil...

Find SimilarView on arXiv