ID: 1711.11071

HSC: A Novel Method for Clustering Hierarchies of Networked Data

November 29, 2017

View on ArXiv
Antonia Korba
Computer Science
Machine Learning
Databases
Information Retrieval

Hierarchical clustering is one of the most powerful solutions to the problem of clustering, on the grounds that it performs a multi scale organization of the data. In recent years, research on hierarchical clustering methods has attracted considerable interest due to the demanding modern application domains. We present a novel divisive hierarchical clustering framework called Hierarchical Stochastic Clustering (HSC), that acts in two stages. In the first stage, it finds a primary hierarchy of clustering partitions in a dataset. In the second stage, feeds a clustering algorithm with each one of the clusters of the very detailed partition, in order to settle the final result. The output is a hierarchy of clusters. Our method is based on the previous research of Meyer and Weissel Stochastic Data Clustering and the theory of Simon and Ando on Variable Aggregation. Our experiments show that our framework builds a meaningful hierarchy of clusters and benefits consistently the clustering algorithm that acts in the second stage, not only computationally but also in terms of cluster quality. This result suggest that HSC framework is ideal for obtaining hierarchical solutions of large volumes of data.

Similar papers 1

Fast and reliable inference algorithm for hierarchical stochastic block models

November 14, 2017

89% Match
Yongjin Park, Joel S. Bader
Machine Learning

Network clustering reveals the organization of a network or corresponding complex system with elements represented as vertices and interactions as edges in a (directed, weighted) graph. Although the notion of clustering can be somewhat loose, network clusters or groups are generally considered as nodes with enriched interactions and edges sharing common patterns. Statistical inference often treats groups as latent variables, with observed networks generated from latent group ...

Find SimilarView on arXiv

Hierarchical Graph Clustering using Node Pair Sampling

June 5, 2018

89% Match
Thomas Bonald, Bertrand Charpentier, ... , Hollocou Alexandre
Social and Information Netwo...
Artificial Intelligence

We present a novel hierarchical graph clustering algorithm inspired by modularity-based clustering techniques. The algorithm is agglomerative and based on a simple distance between clusters induced by the probability of sampling node pairs. We prove that this distance is reducible, which enables the use of the nearest-neighbor chain to speed up the agglomeration. The output of the algorithm is a regular dendrogram, which reveals the multi-scale structure of the graph. The res...

Find SimilarView on arXiv

When Does Bottom-up Beat Top-down in Hierarchical Community Detection?

June 1, 2023

89% Match
Maximilien Dreveton, Daichi Kuroda, ... , Thiran Patrick
cs.SI
cs.LG
math.ST
stat.ME
stat.ML
stat.TH

Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures. There are two main classes of algorithms tackling this problem. Divisive ($\textit{top-down}$) algorithms recursively partition the nodes into two communities, until a stopping rule indicates that no further split is needed. In contrast, agglomerative ($\textit{bottom-up}$) algorithms first identify the smallest comm...

Find SimilarView on arXiv

Reductive Clustering: An Efficient Linear-time Graph-based Divisive Cluster Analysis Approach

June 21, 2018

89% Match
Ching Tarn, Yinan Zhang, Ye Feng
Artificial Intelligence
Computer Vision and Pattern ...
Databases
Information Retrieval
Machine Learning

We propose an efficient linear-time graph-based divisive cluster analysis approach called Reductive Clustering. The approach tries to reveal the hierarchical structural information through reducing the graph into a more concise one repeatedly. With the reductions, the original graph can be divided into subgraphs recursively, and a lite informative dendrogram is constructed based on the divisions. The reduction consists of three steps: selection, connection, and partition. Fir...

Find SimilarView on arXiv

Structural Inference of Hierarchies in Networks

October 9, 2006

88% Match
Aaron Clauset, Cristopher Moore, M. E. J. Newman
Physics and Society
Machine Learning
Data Analysis, Statistics an...

One property of networks that has received comparatively little attention is hierarchy, i.e., the property of having vertices that cluster together in groups, which then join to form groups of groups, and so forth, up through all levels of organization in the network. Here, we give a precise definition of hierarchical structure, give a generic model for generating arbitrary hierarchical structure in a random graph, and describe a statistically principled way to learn the set ...

Find SimilarView on arXiv

A Local Approach for Identifying Clusters in Networks

March 21, 2012

88% Match
Sumit Singh
Social and Information Netwo...
Physics and Society

Graph clustering is a fundamental problem that has been extensively studied both in theory and practice. The problem has been defined in several ways in literature and most of them have been proven to be NP-Hard. Due to their high practical relevancy, several heuristics for graph clustering have been introduced which constitute a central tool for coping with NP-completeness, and are used in applications of clustering ranging from computer vision, to data analysis, to learning...

Find SimilarView on arXiv

Convex Hierarchical Clustering for Graph-Structured Data

November 8, 2019

88% Match
Claire Donnat, Susan Holmes
Applications
Machine Learning
Machine Learning

Convex clustering is a recent stable alternative to hierarchical clustering. It formulates the recovery of progressively coalescing clusters as a regularized convex problem. While convex clustering was originally designed for handling Euclidean distances between data points, in a growing number of applications, the data is directly characterized by a similarity matrix or weighted graph. In this paper, we extend the robust hierarchical clustering approach to these broader clas...

Find SimilarView on arXiv

On Learning the Structure of Clusters in Graphs

December 29, 2022

88% Match
Peter Macgregor
Data Structures and Algorith...
Machine Learning
Social and Information Netwo...

Graph clustering is a fundamental problem in unsupervised learning, with numerous applications in computer science and in analysing real-world data. In many real-world applications, we find that the clusters have a significant high-level structure. This is often overlooked in the design and analysis of graph clustering algorithms which make strong simplifying assumptions about the structure of the graph. This thesis addresses the natural question of whether the structure of c...

Find SimilarView on arXiv

Data clustering using stochastic block models

July 24, 2017

88% Match
Nina Mrzelj, Pavlin Gregor Poličar
Social and Information Netwo...

It has been shown that community detection algorithms work better for clustering tasks than other, more popular methods, such as k-means. In fact, network analysis based methods often outperform more widely used methods and do not suffer from some of the drawbacks we notice elsewhere e.g. the number of clusters k usually has to be known in advance. However, stochastic block models which are known to perform well for community detection, have not yet been tested for this task....

Find SimilarView on arXiv

Hierarchical Clustering Supported by Reciprocal Nearest Neighbors

July 9, 2019

88% Match
Wen-Bo Xie, Yan-Li Lee, Cong Wang, ... , Zhou Tao
Information Retrieval
Social and Information Netwo...
Data Analysis, Statistics an...

Clustering is a fundamental analysis tool aiming at classifying data points into groups based on their similarity or distance. It has found successful applications in all natural and social sciences, including biology, physics, economics, chemistry, astronomy, psychology, and so on. Among numerous existent algorithms, hierarchical clustering algorithms are of a particular advantage as they can provide results under different resolutions without any predetermined number of clu...

Find SimilarView on arXiv