High-Dimensional Data Clustering

A Semi-Definite Programming approach to low dimensional embedding for unsupervised clustering

June 29, 2016

88% Match

Stéphane Chrétien, Clément Dombry, Adrien Faivre

Machine Learning

This paper proposes a variant of the method of Gu\'edon and Verhynin for estimating the cluster matrix in the Mixture of Gaussians framework via Semi-Definite Programming. A clustering oriented embedding is deduced from this estimate. The procedure is suitable for very high dimensional data because it is based on pairwise distances only. Theoretical garantees are provided and an eigenvalue optimisation approach is proposed for computing the embedding. The performance of the m...

Find SimilarView on arXiv

High dimensionality: The latest challenge to data analysis

February 12, 2019

88% Match

A. M. Pires, J. A. Branco

Methodology

The advent of modern technology, permitting the measurement of thousands of characteristics simultaneously, has given rise to floods of data characterized by many large or even huge datasets. This new paradigm presents extraordinary challenges to data analysis and the question arises: how can conventional data analysis methods, devised for moderate or small datasets, cope with the complexities of modern data? The case of high dimensional data is particularly revealing of some...

Find SimilarView on arXiv

ExClus: Explainable Clustering on Low-dimensional Data Representations

November 4, 2021

88% Match

Xander Vankwikelberge, Bo Kang, ... , Lijffijt Jefrey

Machine Learning

Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret. We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable, such as when the data is projected onto a two-dimensional space using a dimensionality-reduction method. Specifically, we propose a new method to compute an interpretable clustering automa...

Find SimilarView on arXiv

Subspace Determination through Local Intrinsic Dimensional Decomposition: Theory and Experimentation

July 15, 2019

88% Match

Ruben Becker, Imane Hafnaoui, Michael E. Houle, ... , Zimek Arthur

Machine Learning

Axis-aligned subspace clustering generally entails searching through enormous numbers of subspaces (feature combinations) and evaluation of cluster quality within each subspace. In this paper, we tackle the problem of identifying subsets of features with the most significant contribution to the formation of the local neighborhood surrounding a given data point. For each point, the recently-proposed Local Intrinsic Dimension (LID) model is used in identifying the axis directio...

Find SimilarView on arXiv

Deep Clustering Based on a Mixture of Autoencoders

December 16, 2018

88% Match

Shlomo E. Chazan, Sharon Gannot, Jacob Goldberger

Machine Learning

Artificial Intelligence

Machine Learning

In this paper we propose a Deep Autoencoder MIxture Clustering (DAMIC) algorithm based on a mixture of deep autoencoders where each cluster is represented by an autoencoder. A clustering network transforms the data into another space and then selects one of the clusters. Next, the autoencoder associated with this cluster is used to reconstruct the data-point. The clustering algorithm jointly learns the nonlinear data representation and the set of autoencoders. The optimal clu...

Find SimilarView on arXiv

EGMM: an Evidential Version of the Gaussian Mixture Model for Clustering

October 3, 2020

88% Match

Lianmeng Jiao, Thierry Denoeux, ... , Pan Quan

Machine Learning

The Gaussian mixture model (GMM) provides a simple yet principled framework for clustering, with properties suitable for statistical inference. In this paper, we propose a new model-based clustering algorithm, called EGMM (evidential GMM), in the theoretical framework of belief functions to better characterize cluster-membership uncertainty. With a mass function representing the cluster membership of each object, the evidential Gaussian mixture distribution composed of the co...

Find SimilarView on arXiv

Groupwise Constrained Reconstruction for Subspace Clustering

June 18, 2012

88% Match

Ruijiang Fudan University Li, Bin University of Technology, Sydney Li, Ke Fudan Univ. Zhang, ... , Xue Xiangyang Fudan University

Machine Learning

Reconstruction based subspace clustering methods compute a self reconstruction matrix over the samples and use it for spectral clustering to obtain the final clustering result. Their success largely relies on the assumption that the underlying subspaces are independent, which, however, does not always hold in the applications with increasing number of subspaces. In this paper, we propose a novel reconstruction based subspace clustering model without making the subspace indepe...

Find SimilarView on arXiv

Unsupervised Deep Embedding for Clustering Analysis

November 19, 2015

87% Match

Junyuan Xie, Ross Girshick, Ali Farhadi

Machine Learning

Computer Vision and Pattern ...

Clustering is central to many data-driven application domains and has been studied extensively in terms of distance functions and grouping algorithms. Relatively little work has focused on learning representations for clustering. In this paper, we propose Deep Embedded Clustering (DEC), a method that simultaneously learns feature representations and cluster assignments using deep neural networks. DEC learns a mapping from the data space to a lower-dimensional feature space in...

Find SimilarView on arXiv

Probabilistic Dimensionality Reduction via Structure Learning

October 17, 2016

87% Match

Li Wang

Machine Learning

We propose a novel probabilistic dimensionality reduction framework that can naturally integrate the generative model and the locality information of data. Based on this framework, we present a new model, which is able to learn a smooth skeleton of embedding points in a low-dimensional space from high-dimensional noisy data. The formulation of the new model can be equivalently interpreted as two coupled learning problem, i.e., structure learning and the learning of projection...

Find SimilarView on arXiv

Deep Embedded K-Means Clustering

September 30, 2021

87% Match

Wengang Guo, Kaiyan Lin, Wei Ye

Machine Learning

Recently, deep clustering methods have gained momentum because of the high representational power of deep neural networks (DNNs) such as autoencoder. The key idea is that representation learning and clustering can reinforce each other: Good representations lead to good clustering while good clustering provides good supervisory signals to representation learning. Critical questions include: 1) How to optimize representation learning and clustering? 2) Should the reconstruction...

Find SimilarView on arXiv