May 25, 2017
A nonparametric approach to the modeling of social networks using degree-corrected stochastic blockmodels is proposed. The model for static network consists of a stochastic blockmodel using a probit regression formulation and popularity parameters are incorporated to account for degree heterogeneity. Dirichlet processes are used to detect community structure as well as induce clustering in the popularity parameters. This approach is flexible yet parsimonious as it allows the ...
March 4, 2022
The increasing prevalence of network data in a vast variety of fields and the need to extract useful information out of them have spurred fast developments in related models and algorithms. Among the various learning tasks with network data, community detection, the discovery of node clusters or "communities," has arguably received the most attention in the scientific community. In many real-world applications, the network data often come with additional information in the fo...
March 28, 2024
This paper studies the directed graph clustering problem through the lens of statistics, where we formulate clustering as estimating underlying communities in the directed stochastic block model (DSBM). We conduct the maximum likelihood estimation (MLE) on the DSBM and thereby ascertain the most probable community assignment given the observed graph structure. In addition to the statistical point of view, we further establish the equivalence between this MLE formulation and a...
April 2, 2014
Community detection is an important task in network analysis, in which we aim to learn a network partition that groups together vertices with similar community-level connectivity patterns. By finding such groups of vertices with similar structural roles, we extract a compact representation of the network's large-scale structure, which can facilitate its scientific interpretation and the prediction of unknown or future interactions. Popular approaches, including the stochastic...
September 27, 2007
We consider the problem of finding communities or modules in directed networks. The most common approach to this problem in the previous literature has been simply to ignore edge direction and apply methods developed for community discovery in undirected networks, but this approach discards potentially useful information contained in the edge directions. Here we show how the widely used benefit function known as modularity can be generalized in a principled fashion to incorpo...
July 18, 2023
We introduce the nested stochastic block model (NSBM) to cluster a collection of networks while simultaneously detecting communities within each network. NSBM has several appealing features including the ability to work on unlabeled networks with potentially different node sets, the flexibility to model heterogeneous communities, and the means to automatically select the number of classes for the networks and the number of communities within each network. This is accomplished...
August 4, 2017
One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach which infers the latent topical structure of a collection of documents. Despite their success --- in particular of its most widely used variant called Latent Dirichlet Allocation (LDA) --- and numerous applications in sociology, history, and linguistics, topic models are known to suffer from ...
September 18, 2013
Community detection in networks has drawn much attention in diverse fields, especially social sciences. Given its significance, there has been a large body of literature with approaches from many fields. Here we present a statistical framework that is representative, extensible, and that yields an estimator with good properties. Our proposed approach considers a stochastic blockmodel based on a logistic regression formulation with node correction terms. We follow a Bayesian a...
June 25, 2020
We develop a principled methodology to infer assortative communities in networks based on a nonparametric Bayesian formulation of the planted partition model. We show that this approach succeeds in finding statistically significant assortative modules in networks, unlike alternatives such as modularity maximization, which systematically overfits both in artificial as well as in empirical examples. In addition, we show that our method is not subject to a resolution limit, and ...
April 6, 2019
Spectral embedding of adjacency or Laplacian matrices of undirected graphs is a common technique for representing a network in a lower dimensional latent space, with optimal theoretical guarantees. The embedding can be used to estimate the community structure of the network, with strong consistency results in the stochastic blockmodel framework. One of the main practical limitations of standard algorithms for community detection from spectral embeddings is that the number of ...