July 19, 2012
Many real-world networks are prohibitively large for data retrieval, storage and analysis of all of its nodes and links. Understanding the structure and dynamics of these networks entails creating a smaller representative sample of the full graph while preserving its relevant topological properties. In this report, we show that graph sampling algorithms currently proposed in the literature are not able to preserve network properties even with sample sizes containing as many as 20% of the nodes from the original graph. We present a new sampling algorithm, called Tiny Sample Extractor, with a new goal of a sample size smaller than 5% of the original graph while preserving two key properties of a network, the degree distribution and its clustering co-efficient. Our approach is based on a new empirical method of estimating measurement biases in crawling algorithms and compensating for them accordingly. We present a detailed comparison of best known graph sampling algorithms, focusing in particular on how the properties of the sample subgraphs converge to those of the original graph as they grow. These results show that our sampling algorithm extracts a smaller subgraph than other algorithms while also achieving a closer convergence to the degree distribution, measured by the degree exponent, of the original graph. The subgraph generated by the Tiny Sample Extractor, however, is not necessarily representative of the full graph with regard to other properties such as assortativity. This indicates that the problem of extracting a truly representative small subgraph from a large graph remains unsolved.
Similar papers 1
February 16, 2021
Graph sampling allows mining a small representative subgraph from a big graph. Sampling algorithms deploy different strategies to replicate the properties of a given graph in the sampled graph. In this study, we provide a comprehensive empirical characterization of five graph sampling algorithms on six properties of a graph including degree, clustering coefficient, path length, global clustering coefficient, assortativity, and modularity. We extract samples from fifteen graph...
February 7, 2014
The amount of large-scale real data around us increase in size very quickly and so does the necessity to reduce its size by obtaining a representative sample. Such sample allows us to use a great variety of analytical methods, whose direct application on original data would be infeasible. There are many methods used for different purposes and with different results. In this paper we outline a simple and straightforward approach based on analyzing the nearest neighbors (NN) th...
November 23, 2021
Analyzing social graphs with limited data access is challenging for third-party researchers. To address this challenge, a number of algorithms that estimate structural properties via a random walk have been developed. However, most existing algorithms are limited to the estimation of local structural properties. Here we propose a method for restoring the original social graph from the small sample obtained by a random walk. The proposed method generates a graph that preserves...
June 8, 2015
In the past few years, the storage and analysis of large-scale and fast evolving networks present a great challenge. Therefore, a number of different techniques have been proposed for sampling large networks. In general, network exploration techniques approximate the original networks more accurately than random node and link selection. Yet, link selection with additional subgraph induction step outperforms most other techniques. In this paper, we apply subgraph induction als...
September 18, 2011
From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they "push" the sampling process towards inclusion of desired properties. Finally,...
October 26, 2021
Sampling random nodes is a fundamental algorithmic primitive in the analysis of massive networks, with many modern graph mining algorithms critically relying on it. We consider the task of generating a large collection of random nodes in the network assuming limited query access (where querying a node reveals its set of neighbors). In current approaches, based on long random walks, the number of queries per sample scales linearly with the mixing time of the network, which can...
August 23, 2013
Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. It has a wide spectrum of applications, e.g. survey hidden population in sociology [54], visualize social graph [29], scale down Internet AS graph [27], graph sparsification [8], etc. In some scenarios, the whole graph is known and the purpose of sampling is to obtain a smaller graph. In other scenarios, the graph is unknown and sampling is regarded as a way to explore the graph. Com...
February 17, 2021
Online social network services provide a platform for human social interactions. Nowadays, many kinds of online interactions generate large-scale social network data. Network analysis helps to mine knowledge and pattern from the relationship between actors inside the network. This approach is important to support predictions and the decision-making process in many real-world applications. The social network analysis methodology, which borrows approaches from graph theory prov...
October 18, 2019
Graph Sampling provides an efficient yet inexpensive solution for analyzing large graphs. While extracting small representative subgraphs from large graphs, the challenge is to capture the properties of the original graph. Several sampling algorithms have been proposed in previous studies, but they lack in extracting good samples. In this paper, we propose a new sampling method called Weighted Edge Sampling. In this method, we give equal weight to all the edges in the beginni...
November 30, 2012
Social graphs can be easily extracted from Online Social Networks. However these networks are getting larger from day to day. Sampling methods used to evaluate graph information cannot accurately extract graph properties. Furthermore Social Networks are limiting the access to their data, making the crawling process even harder. A novel approach on Random Sampling is proposed, considering both limitation and resources. We evaluate this proposal with 4 different settings on 5 d...