July 14, 2005
We discuss two sampling schemes for selecting random subnets from a network: Random sampling and connectivity dependent sampling, and investigate how the degree distribution of a node in the network is affected by the two types of sampling. Here we derive a necessary and sufficient condition that guarantees that the degree distribution of the subnet and the true network belong to the same family of probability distributions. For completely random sampling of nodes we find that this condition is fulfilled by classical random graphs; for the vast majority of networks this condition will, however, not be met. We furthermore discuss the case where the probability of sampling a node depends on the degree of a node and we find that even classical random graphs are no longer closed under this sampling regime. We conclude by relating the results to real {\it E.coli} protein interaction network data.
Similar papers 1
June 1, 2011
We use mathematical methods from the theory of tailored random graphs to study systematically the effects of sampling on topological features of large biological signalling networks. Our aim in doing so is to increase our quantitative understanding of the relation between true biological networks and the imperfect and often biased samples of these networks that are reported in public data repositories and used by biomedical scientists. We derive exact explicit formulae for de...
October 13, 2017
We consider the problem of testing whether a graph's degree distribution belongs to a particular family, such as poisson or scale-free, given that we only observe a sampled subgraph. In particular, we focus on induced subgraph sampling, a sampling design which systematically distorts the degree distribution of interest. We estimate the parameter indexing the hypothesized family by generalized method of moments and utilize the Kolmogorov-Smirnov test statistic to assess goodne...
February 4, 2019
Random graphs are more and more used for modeling real world networks such as evolutionary networks of proteins. For this purpose we look at two different models and analyze how properties like connectedness and degree distributions are inherited by differently constructed subgraphs. We also give a formula for the variance of the degrees of fixed nodes in the preferential attachment model and additionally draw a connection between weighted graphs and electrical networks.
June 8, 2015
In the past few years, the storage and analysis of large-scale and fast evolving networks present a great challenge. Therefore, a number of different techniques have been proposed for sampling large networks. In general, network exploration techniques approximate the original networks more accurately than random node and link selection. Yet, link selection with additional subgraph induction step outperforms most other techniques. In this paper, we apply subgraph induction als...
September 8, 2020
We describe a new method for the random sampling of connected networks with a specified degree sequence. We consider both the case of simple graphs and that of loopless multigraphs. The constraints of fixed degrees and of connectedness are two of the most commonly needed ones when constructing null models for the practical analysis of physical or biological networks. Yet handling these constraints, let alone combining them, is non-trivial. Our method builds on a recently intr...
May 10, 2005
We study the statistical properties of the sampled scale-free networks, deeply related to the proper identification of various real-world networks. We exploit three methods of sampling and investigate the topological properties such as degree and betweenness centrality distribution, average path length, assortativity, and clustering coefficient of sampled networks compared with those of original networks. It is found that the quantities related to those properties in sampled ...
November 28, 2012
We derive the sampling properties of random networks based on weights whose pairwise products parameterize independent Bernoulli trials. This enables an understanding of many degree-based network models, in which the structure of realized networks is governed by properties of their degree sequences. We provide exact results and large-sample approximations for power-law networks and other more general forms. This enables us to quantify sampling variability both within and acro...
January 13, 2022
Graphs are used in many disciplines to model the relationships that exist between objects in a complex discrete system. Researchers may wish to compare a network of interest to a "typical" graph from a family (or ensemble) of graphs which are similar in some way. One way to do this is to take a sample of several random graphs from the family, to gather information about what is "typical". Hence there is a need for algorithms which can generate graphs uniformly (or approximate...
November 25, 2015
Designing algorithms that generate networks with a given degree sequence while varying both subgraph composition and distribution of subgraphs around nodes is an important but challenging research problem. Current algorithms lack control of key network parameters, the ability to specify to what subgraphs a node belongs to, come at a considerable complexity cost or, critically, sample from a limited ensemble of networks. To enable controlled investigations of the impact and ro...
October 13, 2017
Edge sampling is an important topic in network analysis. It provides a natural way to reduce network size while retaining desired features of the original network. Sampling methods that only use local information are common in practice as they do not require access to the entire network and can be parallelized easily. Despite promising empirical performances, most of these methods are derived from heuristic considerations and therefore still lack theoretical justification. To...