Cohesion and Repulsion in Bayesian Dista...

Clustering for high-dimension, low-sample size data using distance vectors

December 12, 2013

84% Match

Yoshikazu Terada

Machine Learning

In high-dimension, low-sample size (HDLSS) data, it is not always true that closeness of two objects reflects a hidden cluster structure. We point out the important fact that it is not the closeness, but the "values" of distance that contain information of the cluster structure in high-dimensional space. Based on this fact, we propose an efficient and simple clustering approach, called distance vector clustering, for HDLSS data. Under the assumptions given in the work of Hall...

Find SimilarView on arXiv

A Parameter-free Affinity Based Clustering

July 20, 2015

84% Match

Bhaskar Mukhoty, Ruchir Gupta, Y. N. Singh

Computer Vision and Pattern ...

Several methods have been proposed to estimate the number of clusters in a dataset; the basic ideal behind all of them has been to study an index that measures inter-cluster separation and intra-cluster cohesion over a range of cluster numbers and report the number which gives an optimum value of the index. In this paper we propose a simple, parameter free approach that is like human cognition to form clusters, where closely lying points are easily identified to form a cluste...

Find SimilarView on arXiv

Entropy regularization in probabilistic clustering

July 19, 2023

84% Match

Beatrice Franzolini, Giovanni Rebaudo

Methodology

Computation

Machine Learning

Bayesian nonparametric mixture models are widely used to cluster observations. However, one major drawback of the approach is that the estimated partition often presents unbalanced clusters' frequencies with only a few dominating clusters and a large number of sparsely-populated ones. This feature translates into results that are often uninterpretable unless we accept to ignore a relevant number of observations and clusters. Interpreting the posterior distribution as penalize...

Find SimilarView on arXiv

Merging $K$-means with hierarchical clustering for identifying general-shaped groups

December 23, 2017

84% Match

Anna D. Peterson, Arka P. Ghosh, Ranjan Maitra

Machine Learning

Computation

Methodology

Clustering partitions a dataset such that observations placed together in a group are similar but different from those in other groups. Hierarchical and $K$-means clustering are two approaches but have different strengths and weaknesses. For instance, hierarchical clustering identifies groups in a tree-like structure but suffers from computational complexity in large datasets while $K$-means clustering is efficient but designed to identify homogeneous spherically-shaped clust...

Find SimilarView on arXiv

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

February 22, 2016

84% Match

Akash Srivastava, James Zou, Charles Sutton

Machine Learning

A good clustering can help a data analyst to explore and understand a data set, but what constitutes a good clustering may depend on domain-specific and application-specific criteria. These criteria can be difficult to formalize, even when it is easy for an analyst to know a good clustering when she sees one. We present a new approach to interactive clustering for data exploration, called \ciif, based on a particularly simple feedback mechanism, in which an analyst can choose...

Find SimilarView on arXiv

ExClus: Explainable Clustering on Low-dimensional Data Representations

November 4, 2021

84% Match

Xander Vankwikelberge, Bo Kang, ... , Lijffijt Jefrey

Machine Learning

Dimensionality reduction and clustering techniques are frequently used to analyze complex data sets, but their results are often not easy to interpret. We consider how to support users in interpreting apparent cluster structure on scatter plots where the axes are not directly interpretable, such as when the data is projected onto a two-dimensional space using a dimensionality-reduction method. Specifically, we propose a new method to compute an interpretable clustering automa...

Find SimilarView on arXiv

An Effective and Efficient Approach for Clusterability Evaluation

February 22, 2016

84% Match

Margareta Ackerman, Andreas Adolfsson, Naomi Brownstein

Machine Learning

Clustering is an essential data mining tool that aims to discover inherent cluster structure in data. As such, the study of clusterability, which evaluates whether data possesses such structure, is an integral part of cluster analysis. Yet, despite their central role in the theory and application of clustering, current notions of clusterability fall short in two crucial aspects that render them impractical; most are computationally infeasible and others fail to classify the s...

Find SimilarView on arXiv

An Experimental Comparison of Several Clustering and Initialization Methods

January 30, 2013

84% Match

Marina Meila, David Heckerman

Machine Learning

We examine methods for clustering in high dimensions. In the first part of the paper, we perform an experimental comparison between three batch clustering algorithms: the Expectation-Maximization (EM) algorithm, a winner take all version of the EM algorithm reminiscent of the K-means algorithm, and model-based hierarchical agglomerative clustering. We learn naive-Bayes models with a hidden root node, using high-dimensional discrete-variable data sets (both real and synthetic)...

Find SimilarView on arXiv

How I learned to stop worrying and love the curse of dimensionality: an appraisal of cluster validation in high-dimensional spaces

January 13, 2022

84% Match

Brian A. Powell

Machine Learning

The failure of the Euclidean norm to reliably distinguish between nearby and distant points in high dimensional space is well-known. This phenomenon of distance concentration manifests in a variety of data distributions, with iid or correlated features, including centrally-distributed and clustered data. Unsupervised learning based on Euclidean nearest-neighbors and more general proximity-oriented data mining tasks like clustering, might therefore be adversely affected by dis...

Find SimilarView on arXiv

Model-Based Hierarchical Clustering

January 16, 2013

84% Match

Shivakumar Vaithyanathan, Byron E Dom

Machine Learning

Artificial Intelligence

Machine Learning

We present an approach to model-based hierarchical clustering by formulating an objective function based on a Bayesian analysis. This model organizes the data into a cluster hierarchy while specifying a complex feature-set partitioning that is a key component of our model. Features can have either a unique distribution in every cluster or a common distribution over some (or even all) of the clusters. The cluster subsets over which these features have such a common distributio...

Find SimilarView on arXiv

Cohesion and Repulsion in Bayesian Distance Clustering

Clustering for high-dimension, low-sample size data using distance vectors

A Parameter-free Affinity Based Clustering

Entropy regularization in probabilistic clustering

Merging $K$-means with hierarchical clustering for identifying general-shaped groups

Clustering with a Reject Option: Interactive Clustering as Bayesian Prior Elicitation

ExClus: Explainable Clustering on Low-dimensional Data Representations

An Effective and Efficient Approach for Clusterability Evaluation

An Experimental Comparison of Several Clustering and Initialization Methods

How I learned to stop worrying and love the curse of dimensionality: an appraisal of cluster validation in high-dimensional spaces

Model-Based Hierarchical Clustering