April 1, 2001
Similar papers 5
July 22, 2020
The expressive power of artificial neural networks crucially depends on the nonlinearity of their activation functions. Though a wide variety of nonlinear activation functions have been proposed for use in artificial neural networks, a detailed understanding of their role in determining the expressive power of a network has not emerged. Here, we study how activation functions affect the storage capacity of treelike two-layer networks. We relate the boundedness or divergence o...
July 17, 2019
Rectified Linear Units (ReLU) have become the main model for the neural units in current deep learning systems. This choice has been originally suggested as a way to compensate for the so called vanishing gradient problem which can undercut stochastic gradient descent (SGD) learning in networks composed of multiple layers. Here we provide analytical results on the effects of ReLUs on the capacity and on the geometrical landscape of the solution space in two-layer neural netwo...
August 16, 2020
Symmetric functions, which take as input an unordered, fixed-size set, are known to be universally representable by neural networks that enforce permutation invariance. These architectures only give guarantees for fixed input sizes, yet in many practical applications, including point clouds and particle physics, a relevant notion of generalization should include varying the input size. In this work we treat symmetric functions (of any size) as functions over probability measu...
February 11, 2022
The recent progresses in Machine Learning opened the door to actual applications of learning algorithms but also to new research directions both in the field of Machine Learning directly and, at the edges with other disciplines. The case that interests us is the interface with physics, and more specifically Statistical Physics. In this short lecture, I will try to present first a brief introduction to Machine Learning from the angle of neural networks. After explaining quickl...
January 31, 2002
The capacity with which a system of independent neuron-like units represents a given set of stimuli is studied by calculating the mutual information between the stimuli and the neural responses. Both discrete noiseless and continuous noisy neurons are analyzed. In both cases, the information grows monotonically with the number of neurons considered. Under the assumption that neurons are independent, the mutual information rises linearly from zero, and approaches exponentially...
April 21, 2020
An autoencoder is a layered neural network whose structure can be viewed as consisting of an encoder, which compresses an input vector of dimension $D$ to a vector of low dimension $d$, and a decoder which transforms the low-dimensional vector back to the original input vector (or one that is very similar). In this paper we explore the compressive power of autoencoders that are Boolean threshold networks by studying the numbers of nodes and layers that are required to ensure ...
March 8, 1999
The subject of study is a neural network with binary neurons, randomly diluted synapses and variable pattern activity. We look at the system with parallel updating using a probabilistic approach to solve the one step dynamics with one condensed pattern. We derive restrictions on the storage capacity and the mutual information content occuring during the retrieval process. Special focus is on the constraints on the threshold for optimal performance. We also look at the effect ...
March 9, 2015
Deep Neural Networks (DNNs) are analyzed via the theoretical framework of the information bottleneck (IB) principle. We first show that any DNN can be quantified by the mutual information between the layers and the input and output variables. Using this representation we can calculate the optimal information theoretic limits of the DNN and obtain finite sample generalization bounds. The advantage of getting closer to the theoretical limit is quantifiable both by the generaliz...
September 3, 1999
During the last few years an area of active research in the field of complex systems is that of their information storing and processing abilities. Common opinion has it that the most interesting beaviour of these systems is found ``at the edge of chaos'', which would seem to suggest that complex systems may have inherently non-trivial information proccesing abilities in the vicinity of sharp phase transitions. A comprenhensive, quantitative understanding of why this is the c...
November 28, 2019
We consider a three-layer Sejnowski machine and show that features learnt via contrastive divergence have a dual representation as patterns in a dense associative memory of order P=4. The latter is known to be able to Hebbian-store an amount of patterns scaling as N^{P-1}, where N denotes the number of constituting binary neurons interacting P-wisely. We also prove that, by keeping the dense associative network far from the saturation regime (namely, allowing for a number of ...