ID: 1106.0236

What you see is not what you get: how sampling affects macroscopic features of biological networks

June 1, 2011

View on ArXiv
A. Annibale, A. C. C. Coolen
Quantitative Biology
Condensed Matter
Quantitative Methods
Disordered Systems and Neura...

We use mathematical methods from the theory of tailored random graphs to study systematically the effects of sampling on topological features of large biological signalling networks. Our aim in doing so is to increase our quantitative understanding of the relation between true biological networks and the imperfect and often biased samples of these networks that are reported in public data repositories and used by biomedical scientists. We derive exact explicit formulae for degree distributions and degree correlation kernels of sampled networks, in terms of the degree distributions and degree correlation kernels of the underlying true network, for a broad family of sampling protocols that include (un-)biased node and/or link undersampling as well as (un-)biased link oversampling. Our predictions are in excellent agreement with numerical simulations.

Similar papers 1

Michael P. H. Stumpf, Carsten Wiuf
Statistical Mechanics

We discuss two sampling schemes for selecting random subnets from a network: Random sampling and connectivity dependent sampling, and investigate how the degree distribution of a node in the network is affected by the two types of sampling. Here we derive a necessary and sufficient condition that guarantees that the degree distribution of the subnet and the true network belong to the same family of probability distributions. For completely random sampling of nodes we find tha...

Arun S. Maiya, Tanya Y. Berger-Wolf
Social and Information Netwo...
Physics and Society

From social networks to P2P systems, network sampling arises in many settings. We present a detailed study on the nature of biases in network sampling strategies to shed light on how best to sample from networks. We investigate connections between specific biases and various measures of structural representativeness. We show that certain biases are, in fact, beneficial for many applications, as they "push" the sampling process towards inclusion of desired properties. Finally,...

A. Annibale, A. C. C. Coolen, L. P. Fernandes, ... , Kleinjung J.
Disordered Systems and Neura...

We study the tailoring of structured random graph ensembles to real networks, with the objective of generating precise and practical mathematical tools for quantifying and comparing network topologies macroscopically, beyond the level of degree statistics. Our family of ensembles can produce graphs with any prescribed degree distribution and any degree-degree correlation function, its control parameters can be calculated fully analytically, and as a result we can calculate (a...

E. S. Roberts, A. C. C. Coolen, T. Schlitt
Quantitative Methods
Disordered Systems and Neura...
Social and Information Netwo...
Physics and Society

We generate new mathematical tools with which to quantify the macroscopic topological structure of large directed networks. This is achieved via a statistical mechanical analysis of constrained maximum entropy ensembles of directed random graphs with prescribed joint distributions for in- and outdegrees and prescribed degree-degree correlation functions. We calculate exact and explicit formulae for the leading orders in the system size of the Shannon entropies and complexitie...

Gloria Cecchini, Bjoern Schelter
Data Analysis, Statistics an...
Probability
Applications

When the network is reconstructed, two types of errors can occur: false positive and false negative errors about the presence or absence of links. In this paper, the influence of these two errors on the vertex degree distribution is analytically analysed. Moreover, an analytic formula of the density of the biased vertex degree distribution is found. In the inverse problem, we find a reliable procedure to reconstruct analytically the density of the vertex degree distribution o...

Charalampos E. Tsourakakis
Data Structures and Algorith...
Distributed, Parallel, and C...
Discrete Mathematics
Social and Information Netwo...
Quantitative Methods

This dissertation contributes to mathematical and algorithmic problems that arise in the analysis of network and biological data.

Harish Sethu, Xiaoyu Chu
Data Structures and Algorith...
Social and Information Netwo...
Physics and Society

Many real-world networks are prohibitively large for data retrieval, storage and analysis of all of its nodes and links. Understanding the structure and dynamics of these networks entails creating a smaller representative sample of the full graph while preserving its relevant topological properties. In this report, we show that graph sampling algorithms currently proposed in the literature are not able to preserve network properties even with sample sizes containing as many a...

Sang Hoon Lee, Pan-Jun Kim, Hawoong Jeong
Disordered Systems and Neura...
Physics and Society
Methodology

We study the statistical properties of the sampled scale-free networks, deeply related to the proper identification of various real-world networks. We exploit three methods of sampling and investigate the topological properties such as degree and betweenness centrality distribution, average path length, assortativity, and clustering coefficient of sampled networks compared with those of original networks. It is found that the quantities related to those properties in sampled ...

Nelson Antunes, Shankar Bhamidi, Tianjian Guo, ... , Wang Bang
Methodology
Social and Information Netwo...
Physics and Society

The focus of this work is on estimation of the in-degree distribution in directed networks from sampling network nodes or edges. A number of sampling schemes are considered, including random sampling with and without replacement, and several approaches based on random walks with possible jumps. When sampling nodes, it is assumed that only the out-edges of that node are visible, that is, the in-degree of that node is not observed. The suggested estimation of the in-degree dist...

Neli Blagus, Lovro Šubelj, Marko Bajec
Social and Information Netwo...
Physics and Society

In the past few years, the storage and analysis of large-scale and fast evolving networks present a great challenge. Therefore, a number of different techniques have been proposed for sampling large networks. In general, network exploration techniques approximate the original networks more accurately than random node and link selection. Yet, link selection with additional subgraph induction step outperforms most other techniques. In this paper, we apply subgraph induction als...