ID: 1109.3911

Benefits of Bias: Towards Better Characterization of Network Sampling

September 18, 2011

View on ArXiv
Arun S. Maiya, Tanya Y. Berger-Wolf
Computer Science
Physics
Social and Information Netwo...
Physics and Society

From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they "push" the sampling process towards inclusion of desired properties. Finally, we describe how these sampling biases can be exploited in several, real-world applications including disease outbreak detection and market research.

Similar papers 1

Sampling of Temporal Networks: Methods and Biases

July 7, 2017

91% Match
Luis E C Rocha, Naoki Masuda, Petter Holme
Physics and Society
Social and Information Netwo...
Data Analysis, Statistics an...

Temporal networks have been increasingly used to model a diversity of systems that evolve in time; for example human contact structures over which dynamic processes such as epidemics take place. A fundamental aspect of real-life networks is that they are sampled within temporal and spatial frames. Furthermore, one might wish to subsample networks to reduce their size for better visualization or to perform computationally intensive simulations. The sampling method may affect t...

Find SimilarView on arXiv

A new algorithm for extracting a small representative subgraph from a very large graph

July 19, 2012

91% Match
Harish Sethu, Xiaoyu Chu
Data Structures and Algorith...
Social and Information Netwo...
Physics and Society

Many real-world networks are prohibitively large for data retrieval, storage and analysis of all of its nodes and links. Understanding the structure and dynamics of these networks entails creating a smaller representative sample of the full graph while preserving its relevant topological properties. In this report, we show that graph sampling algorithms currently proposed in the literature are not able to preserve network properties even with sample sizes containing as many a...

Find SimilarView on arXiv

Network Sampling Based on NN Representatives

February 7, 2014

90% Match
Milos Kudelka, Sarka Zehnalova, Jan Platos
Social and Information Netwo...
Physics and Society

The amount of large-scale real data around us increase in size very quickly and so does the necessity to reduce its size by obtaining a representative sample. Such sample allows us to use a great variety of analytical methods, whose direct application on original data would be infeasible. There are many methods used for different purposes and with different results. In this paper we outline a simple and straightforward approach based on analyzing the nearest neighbors (NN) th...

Find SimilarView on arXiv

Sampling properties of directed networks

January 6, 2012

90% Match
Seung-Woo Son, Claire Christensen, Golnoosh Bizhani, David V. Foster, ... , Paczuski Maya
Physics and Society
Social and Information Netwo...
Data Analysis, Statistics an...

For many real-world networks only a small "sampled" version of the original network may be investigated; those results are then used to draw conclusions about the actual system. Variants of breadth-first search (BFS) sampling, which are based on epidemic processes, are widely used. Although it is well established that BFS sampling fails, in most cases, to capture the IN-component(s) of directed networks, a description of the effects of BFS sampling on other topological proper...

Find SimilarView on arXiv

On the bias of BFS

April 10, 2010

90% Match
Maciej Kurant, Athina Markopoulou, Patrick Thiran
Discrete Mathematics
Data Structures and Algorith...
Networking and Internet Arch...
Social and Information Netwo...
Methodology

Breadth First Search (BFS) and other graph traversal techniques are widely used for measuring large unknown graphs, such as online social networks. It has been empirically observed that an incomplete BFS is biased toward high degree nodes. In contrast to more studied sampling techniques, such as random walks, the precise bias of BFS has not been characterized to date. In this paper, we quantify the degree bias of BFS sampling. In particular, we calculate the node degree distr...

Find SimilarView on arXiv

Empirical Characterization of Graph Sampling Algorithms

February 16, 2021

90% Match
Muhammad Irfan Yousuf, Izza Anwer, Raheel Anwar
Social and Information Netwo...
Data Structures and Algorith...

Graph sampling allows mining a small representative subgraph from a big graph. Sampling algorithms deploy different strategies to replicate the properties of a given graph in the sampled graph. In this study, we provide a comprehensive empirical characterization of five graph sampling algorithms on six properties of a graph including degree, clustering coefficient, path length, global clustering coefficient, assortativity, and modularity. We extract samples from fifteen graph...

Find SimilarView on arXiv

Network Sampling: From Static to Streaming Graphs

November 14, 2012

90% Match
Nesreen K. Ahmed, Jennifer Neville, Ramana Kompella
Social and Information Netwo...
Data Structures and Algorith...
Machine Learning
Physics and Society
Machine Learning

Network sampling is integral to the analysis of social, information, and biological networks. Since many real-world networks are massive in size, continuously evolving, and/or distributed in nature, the network structure is often sampled in order to facilitate study. For these reasons, a more thorough and complete understanding of network sampling is critical to support the field of network science. In this paper, we outline a framework for the general problem of network samp...

Find SimilarView on arXiv

A Survey and Taxonomy of Graph Sampling

August 23, 2013

90% Match
Pili Hu, Wing Cheong Lau
Social and Information Netwo...
Probability
Methodology

Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. It has a wide spectrum of applications, e.g. survey hidden population in sociology [54], visualize social graph [29], scale down Internet AS graph [27], graph sparsification [8], etc. In some scenarios, the whole graph is known and the purpose of sampling is to obtain a smaller graph. In other scenarios, the graph is unknown and sampling is regarded as a way to explore the graph. Com...

Find SimilarView on arXiv
A. Annibale, A. C. C. Coolen
Quantitative Methods
Disordered Systems and Neura...

We use mathematical methods from the theory of tailored random graphs to study systematically the effects of sampling on topological features of large biological signalling networks. Our aim in doing so is to increase our quantitative understanding of the relation between true biological networks and the imperfect and often biased samples of these networks that are reported in public data repositories and used by biomedical scientists. We derive exact explicit formulae for de...

Sampling from Social Networks with Attributes

February 17, 2017

89% Match
Claudia Wagner, Philipp Singer, Fariba Karimi, ... , Strohmaier Markus
Social and Information Netwo...
Physics and Society

Sampling from large networks represents a fundamental challenge for social network research. In this paper, we explore the sensitivity of different sampling techniques (node sampling, edge sampling, random walk sampling, and snowball sampling) on social networks with attributes. We consider the special case of networks (i) where we have one attribute with two values (e.g., male and female in the case of gender), (ii) where the size of the two groups is unequal (e.g., a male m...

Find SimilarView on arXiv