November 14, 2017
Network clustering reveals the organization of a network or corresponding complex system with elements represented as vertices and interactions as edges in a (directed, weighted) graph. Although the notion of clustering can be somewhat loose, network clusters or groups are generally considered as nodes with enriched interactions and edges sharing common patterns. Statistical inference often treats groups as latent variables, with observed networks generated from latent group structure, termed a stochastic block model. Regardless of the definitions, statistical inference can be either translated to modularity maximization, which is provably an NP-complete problem. Here we present scalable and reliable algorithms that recover hierarchical stochastic block models fast and accurately. Our algorithm scales almost linearly in number of edges, and inferred models were more accurate that other scalable methods.
Similar papers 1
May 29, 2017
This chapter provides a self-contained introduction to the use of Bayesian inference to extract large-scale modular structures from network data, based on the stochastic blockmodel (SBM), as well as its degree-corrected and overlapping generalizations. We focus on nonparametric formulations that allow their inference in a manner that prevents overfitting, and enables model selection. We discuss aspects of the choice of priors, in particular how to avoid underfitting via incre...
October 16, 2013
Discovering and characterizing the large-scale topological features in empirical networks are crucial steps in understanding how complex systems function. However, most existing methods used to obtain the modular structure of networks suffer from serious problems, such as being oblivious to the statistical evidence supporting the discovered patterns, which results in the inability to separate actual structure from noise. In addition to this, one also observes a resolution lim...
October 16, 2013
We present an efficient algorithm for the inference of stochastic block models in large networks. The algorithm can be used as an optimized Markov chain Monte Carlo (MCMC) method, with a fast mixing time and a much reduced susceptibility to getting trapped in metastable states, or as a greedy agglomerative heuristic, with an almost linear $O(N\ln^2N)$ complexity, where $N$ is the number of nodes in the network, independent on the number of blocks being inferred. We show that ...
September 15, 2020
Modular and hierarchical community structures are pervasive in real-world complex systems. A great deal of effort has gone into trying to detect and study these structures. Important theoretical advances in the detection of modular have included identifying fundamental limits of detectability by formally defining community structure using probabilistic generative models. Detecting hierarchical community structure introduces additional challenges alongside those inherited from...
June 1, 2023
Hierarchical clustering of networks consists in finding a tree of communities, such that lower levels of the hierarchy reveal finer-grained community structures. There are two main classes of algorithms tackling this problem. Divisive ($\textit{top-down}$) algorithms recursively partition the nodes into two communities, until a stopping rule indicates that no further split is needed. In contrast, agglomerative ($\textit{bottom-up}$) algorithms first identify the smallest comm...
March 7, 2015
We propose a robust, scalable, integrated methodology for community detection and community comparison in graphs. In our procedure, we first embed a graph into an appropriate Euclidean space to obtain a low-dimensional representation, and then cluster the vertices into communities. We next employ nonparametric graph inference techniques to identify structural similarity among these communities. These two steps are then applied recursively on the communities, allowing us to de...
May 24, 2013
We generalize the stochastic block model to the important case in which edges are annotated with weights drawn from an exponential family distribution. This generalization introduces several technical difficulties for model estimation, which we solve using a Bayesian approach. We introduce a variational algorithm that efficiently approximates the model's posterior distribution for dense graphs. In specific numerical experiments on edge-weighted networks, this weighted stochas...
October 9, 2006
One property of networks that has received comparatively little attention is hierarchy, i.e., the property of having vertices that cluster together in groups, which then join to form groups of groups, and so forth, up through all levels of organization in the network. Here, we give a precise definition of hierarchical structure, give a generic model for generating arbitrary hierarchical structure in a random graph, and describe a statistically principled way to learn the set ...
February 7, 2020
There exist various types of network block models such as the Stochastic Block Model (SBM), the Degree Corrected Block Model (DCBM), and the Popularity Adjusted Block Model (PABM). While this leads to a variety of choices, the block models do not have a nested structure. In addition, there is a substantial jump in the number of parameters from the DCBM to the PABM. The objective of this paper is formulation of a hierarchy of block model which does not rely on arbitrary identi...
March 1, 2019
There have been rapid developments in model-based clustering of graphs, also known as block modelling, over the last ten years or so. We review different approaches and extensions proposed for different aspects in this area, such as the type of the graph, the clustering approach, the inference approach, and whether the number of groups is selected or estimated. We also review models that combine block modelling with topic modelling and/or longitudinal modelling, regarding how...