September 27, 2022
Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. Among various graph sampling approaches, Traversal Based Sampling (TBS) are widely used due to low cost and feasibility for many cases, in which Simple Random Walk (SRW) and its variants share a large proportion in TBS. We illustrate the foundation SRW and presents the problems of SRW. Based on the problems, we provide a taxonomy of different Random Walk (RW) based graph sampling met...
May 22, 2014
The sampling method has been paid much attention in the field of complex network in general and statistical physics in particular. This paper presents two new sampling methods based on the perspective that a small part of vertices with high node degree can possess the most structure information of a network. The two proposed sampling methods are efficient in sampling the nodes with high degree. The first new sampling method is improved on the basis of the stratified random sa...
August 30, 2017
Random walk-based sampling methods are gaining popularity and importance in characterizing large networks. While powerful, they suffer from the slow mixing problem when the graph is loosely connected, which results in poor estimation accuracy. Random walk with jumps (RWwJ) can address the slow mixing problem but it is inapplicable if the graph does not support uniform vertex sampling (UNI). In this work, we develop methods that can efficiently sample a graph without the neces...
July 14, 2005
We discuss two sampling schemes for selecting random subnets from a network: Random sampling and connectivity dependent sampling, and investigate how the degree distribution of a node in the network is affected by the two types of sampling. Here we derive a necessary and sufficient condition that guarantees that the degree distribution of the subnet and the true network belong to the same family of probability distributions. For completely random sampling of nodes we find tha...
September 18, 2011
From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they "push" the sampling process towards inclusion of desired properties. Finally,...
July 7, 2017
Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect t...
June 22, 2013
Exploring statistics of locally connected subgraph patterns (also known as network motifs) has helped researchers better understand the structure and function of biological and online social networks (OSNs). Nowadays the massive size of some critical networks -- often stored in already overloaded relational databases -- effectively limits the rate at which nodes and edges can be explored, making it a challenge to accurately discover subgraph statistics. In this work, we propo...
October 2, 2018
The focus of this work is on estimation of the in-degree distribution in directed networks from sampling network nodes or edges. A number of sampling schemes are considered, including random sampling with and without replacement, and several approaches based on random walks with possible jumps. When sampling nodes, it is assumed that only the out-edges of that node are visible, that is, the in-degree of that node is not observed. The suggested estimation of the in-degree dist...
January 6, 2012
For many real-world networks only a small "sampled" version of the original network may be investigated; those results are then used to draw conclusions about the actual system. Variants of breadth-first search (BFS) sampling, which are based on epidemic processes, are widely used. Although it is well established that BFS sampling fails, in most cases, to capture the IN-component(s) of directed networks, a description of the effects of BFS sampling on other topological proper...
April 29, 2016
Exploring small connected and induced subgraph patterns (CIS patterns, or graphlets) has recently attracted considerable attention. Despite recent efforts on computing the number of instances a specific graphlet appears in a large graph (i.e., the total number of CISes isomorphic to the graphlet), little attention has been paid to characterizing a node's graphlet degree, i.e., the number of CISes isomorphic to the graphlet that include the node, which is an important metric f...