ID: 1207.4825

A new algorithm for extracting a small representative subgraph from a very large graph

July 19, 2012

View on ArXiv
Harish Sethu, Xiaoyu Chu
Computer Science
Physics
Data Structures and Algorith...
Social and Information Netwo...
Physics and Society

Many real-world networks are prohibitively large for data retrieval, storage and analysis of all of its nodes and links. Understanding the structure and dynamics of these networks entails creating a smaller representative sample of the full graph while preserving its relevant topological properties. In this report, we show that graph sampling algorithms currently proposed in the literature are not able to preserve network properties even with sample sizes containing as many as 20% of the nodes from the original graph. We present a new sampling algorithm, called Tiny Sample Extractor, with a new goal of a sample size smaller than 5% of the original graph while preserving two key properties of a network, the degree distribution and its clustering co-efficient. Our approach is based on a new empirical method of estimating measurement biases in crawling algorithms and compensating for them accordingly. We present a detailed comparison of best known graph sampling algorithms, focusing in particular on how the properties of the sample subgraphs converge to those of the original graph as they grow. These results show that our sampling algorithm extracts a smaller subgraph than other algorithms while also achieving a closer convergence to the degree distribution, measured by the degree exponent, of the original graph. The subgraph generated by the Tiny Sample Extractor, however, is not necessarily representative of the full graph with regard to other properties such as assortativity. This indicates that the problem of extracting a truly representative small subgraph from a large graph remains unsolved.

Similar papers 1

Empirical Characterization of Graph Sampling Algorithms

February 16, 2021

92% Match
Muhammad Irfan Yousuf, Izza Anwer, Raheel Anwar
Social and Information Netwo...
Data Structures and Algorith...

Graph sampling allows mining a small representative subgraph from a big graph. Sampling algorithms deploy different strategies to replicate the properties of a given graph in the sampled graph. In this study, we provide a comprehensive empirical characterization of five graph sampling algorithms on six properties of a graph including degree, clustering coefficient, path length, global clustering coefficient, assortativity, and modularity. We extract samples from fifteen graph...

Find SimilarView on arXiv

Network Sampling Based on NN Representatives

February 7, 2014

92% Match
Milos Kudelka, Sarka Zehnalova, Jan Platos
Social and Information Netwo...
Physics and Society

The amount of large-scale real data around us increase in size very quickly and so does the necessity to reduce its size by obtaining a representative sample. Such sample allows us to use a great variety of analytical methods, whose direct application on original data would be infeasible. There are many methods used for different purposes and with different results. In this paper we outline a simple and straightforward approach based on analyzing the nearest neighbors (NN) th...

Find SimilarView on arXiv

Social Graph Restoration via Random Walk Sampling

November 23, 2021

91% Match
Kazuki Nakajima, Kazuyuki Shudo
Social and Information Netwo...
Physics and Society

Analyzing social graphs with limited data access is challenging for third-party researchers. To address this challenge, a number of algorithms that estimate structural properties via a random walk have been developed. However, most existing algorithms are limited to the estimation of local structural properties. Here we propose a method for restoring the original social graph from the small sample obtained by a random walk. The proposed method generates a graph that preserves...

Find SimilarView on arXiv

Empirical comparison of network sampling techniques

June 8, 2015

91% Match
Neli Blagus, Lovro Šubelj, Marko Bajec
Social and Information Netwo...
Physics and Society

In the past few years, the storage and analysis of large-scale and fast evolving networks present a great challenge. Therefore, a number of different techniques have been proposed for sampling large networks. In general, network exploration techniques approximate the original networks more accurately than random node and link selection. Yet, link selection with additional subgraph induction step outperforms most other techniques. In this paper, we apply subgraph induction als...

Find SimilarView on arXiv

Benefits of Bias: Towards Better Characterization of Network Sampling

September 18, 2011

91% Match
Arun S. Maiya, Tanya Y. Berger-Wolf
Social and Information Netwo...
Physics and Society

From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they "push" the sampling process towards inclusion of desired properties. Finally,...

Find SimilarView on arXiv

Sampling Multiple Nodes in Large Networks: Beyond Random Walks

October 26, 2021

90% Match
Omri Ben-Eliezer, Talya Eden, ... , Fotakis Dimitris
Social and Information Netwo...
Data Structures and Algorith...

Sampling random nodes is a fundamental algorithmic primitive in the analysis of massive networks, with many modern graph mining algorithms critically relying on it. We consider the task of generating a large collection of random nodes in the network assuming limited query access (where querying a node reveals its set of neighbors). In current approaches, based on long random walks, the number of queries per sample scales linearly with the mixing time of the network, which can...

Find SimilarView on arXiv

A Survey and Taxonomy of Graph Sampling

August 23, 2013

90% Match
Pili Hu, Wing Cheong Lau
Social and Information Netwo...
Probability
Methodology

Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. It has a wide spectrum of applications, e.g. survey hidden population in sociology [54], visualize social graph [29], scale down Internet AS graph [27], graph sparsification [8], etc. In some scenarios, the whole graph is known and the purpose of sampling is to obtain a smaller graph. In other scenarios, the graph is unknown and sampling is regarded as a way to explore the graph. Com...

Find SimilarView on arXiv

Graph Sampling Approach for Reducing Computational Complexity of Large-Scale Social Network

February 17, 2021

90% Match
Andry Alamsyah, Yahya Peranginangin, Intan Muchtadi-Alamsyah, ... , Kuspriyanto
Social and Information Netwo...

Online social network services provide a platform for human social interactions. Nowadays, many kinds of online interactions generate large-scale social network data. Network analysis helps to mine knowledge and pattern from the relationship between actors inside the network. This approach is important to support predictions and the decision-making process in many real-world applications. The social network analysis methodology, which borrows approaches from graph theory prov...

Find SimilarView on arXiv

Weighted Edge Sampling for Static Graphs

October 18, 2019

90% Match
Muhammad Irfan Yousuf, Raheel Anwar
Data Structures and Algorith...
Social and Information Netwo...

Graph Sampling provides an efficient yet inexpensive solution for analyzing large graphs. While extracting small representative subgraphs from large graphs, the challenge is to capture the properties of the original graph. Several sampling algorithms have been proposed in previous studies, but they lack in extracting good samples. In this paper, we propose a new sampling method called Weighted Edge Sampling. In this method, we give equal weight to all the edges in the beginni...

Find SimilarView on arXiv

Real Time Enhanced Random Sampling of Online Social Networks

November 30, 2012

90% Match
Giannis Haralabopoulos, Ioannis Anagnostopoulos
Social and Information Netwo...
Information Retrieval
Physics and Society

Social graphs can be easily extracted from Online Social Networks. However these networks are getting larger from day to day. Sampling methods used to evaluate graph information cannot accurately extract graph properties. Furthermore Social Networks are limiting the access to their data, making the crawling process even harder. A novel approach on Random Sampling is proposed, considering both limitation and resources. We evaluate this proposal with 4 different settings on 5 d...

Find SimilarView on arXiv