July 8, 2016
Statistical ensembles of networks, i.e., probability spaces of all networks that are consistent with given aggregate statistics, have become instrumental in the analysis of complex networks. Their numerical and analytical study provides the foundation for the inference of topological patterns, the definition of network-analytic measures, as well as for model selection and statistical hypothesis testing. Contributing to the foundation of these data analysis techniques, in this Letter we introduce generalized hypergeometric ensembles, a broad class of analytically tractable statistical ensembles of finite, directed and weighted networks. This framework can be interpreted as a generalization of the classical configuration model, which is commonly used to randomly generate networks with a given degree sequence or distribution. Our generalization rests on the introduction of dyadic link propensities, which capture the degree-corrected tendencies of pairs of nodes to form edges between each other. Studying empirical and synthetic data, we show that our approach provides broad perspectives for model selection and statistical hypothesis testing in data on complex networks.
Similar papers 1
June 14, 2017
The inference of network topologies from relational data is an important problem in data analysis. Exemplary applications include the reconstruction of social ties from data on human interactions, the inference of gene co-expression networks from DNA microarray data, or the learning of semantic relationships based on co-occurrences of words in documents. Solving these problems requires techniques to infer significant links in noisy relational data. In this short paper, we pro...
August 12, 2009
We study the tailoring of structured random graph ensembles to real networks, with the objective of generating precise and practical mathematical tools for quantifying and comparing network topologies macroscopically, beyond the level of degree statistics. Our family of ensembles can produce graphs with any prescribed degree distribution and any degree-degree correlation function, its control parameters can be calculated fully analytically, and as a result we can calculate (a...
October 15, 2018
We introduce a broad class of random graph models: the generalised hypergeometric ensemble (GHypEG). This class enables to solve some long standing problems in random graph theory. First, GHypEG provides an elegant and compact formulation of the well-known configuration model in terms of an urn problem. Second, GHypEG allows to incorporate arbitrary tendencies to connect different vertex pairs. Third, we present the closed-form expressions of the associated probability distri...
January 31, 2011
We generate new mathematical tools with which to quantify the macroscopic topological structure of large directed networks. This is achieved via a statistical mechanical analysis of constrained maximum entropy ensembles of directed random graphs with prescribed joint distributions for in- and outdegrees and prescribed degree-degree correlation functions. We calculate exact and explicit formulae for the leading orders in the system size of the Shannon entropies and complexitie...
July 24, 2007
We introduce and study a class of exchangeable random graph ensembles. They can be used as statistical null models for empirical networks, and as a tool for theoretical investigations. We provide general theorems that carachterize the degree distribution of the ensemble graphs, together with some features that are important for applications, such as subgraph distributions and kernel of the adjacency matrix. These results are used to compare to other models of simple and compl...
February 20, 2008
In this paper we generalize the concept of random networks to describe networks with non trivial features by a statistical mechanics approach. This framework is able to describe ensembles of undirected, directed as well as weighted networks. These networks might have not trivial community structure or, in the case of networks embedded in a given space, non trivial distance dependence of the link probability. These ensembles are characterized by their entropy which evaluate th...
February 7, 2017
We introduce a statistical regression model to investigate the impact of dyadic relations on complex networks generated from observed repeated interactions. It is based on generalised hypergeometric ensembles (gHypEG), a class of statistical network ensembles developed recently to deal with multi-edge graph and count data. We represent different types of known relations between system elements by weighted graphs, separated in the different layers of a multiplex network. With ...
August 1, 2007
Randomized network ensembles are the null models of real networks and are extensivelly used to compare a real system to a null hypothesis. In this paper we study network ensembles with the same degree distribution, the same degree-correlations or the same community structure of any given real network. We characterize these randomized network ensembles by their entropy, i.e. the normalized logarithm of the total number of networks which are part of these ensembles. We estima...
June 26, 2009
Graphs and networks provide a canonical representation of relational data, with massive network data sets becoming increasingly prevalent across a variety of scientific fields. Although tools from mathematics and computer science have been eagerly adopted by practitioners in the service of network inference, they do not yet comprise a unified and coherent framework for the statistical analysis of large-scale network data. This paper serves as both an introduction to the topic...
April 14, 2014
Complex networks grow subject to structural constraints which affect their measurable properties. Assessing the effect that such constraints impose on their observables is thus a crucial aspect to be taken into account in their analysis. To this end,we examine the effect of fixing the strength sequence in multi-edge networks on several network observables such as degrees, disparity, average neighbor properties and weight distribution using an ensemble approach. We provide a g...