April 1, 2001
Similar papers 3
July 30, 2013
This paper examines the memory capacity of generalized neural networks. Hopfield networks trained with a variety of learning techniques are investigated for their capacity both for binary and non-binary alphabets. It is shown that the capacity can be much increased when multilevel inputs are used. New learning strategies are proposed to increase Hopfield network capacity, and the scalability of these methods is also examined in respect to size of the network. The ability to r...
July 19, 2019
There is some theoretical evidence that deep neural networks with multiple hidden layers have a potential for more efficient representation of multidimensional mappings than shallow networks with a single hidden layer. The question is whether it is possible to exploit this theoretical advantage for finding such representations with help of numerical training methods. Tests using prototypical problems with a known mean square minimum did not confirm this hypothesis. Minima fou...
August 20, 2017
We derive the calculation of two critical numbers predicting the behavior of perceptron networks. First, we derive the calculation of what we call the lossless memory (LM) dimension. The LM dimension is a generalization of the Vapnik--Chervonenkis (VC) dimension that avoids structured data and therefore provides an upper bound for perfectly fitting almost any training data. Second, we derive what we call the MacKay (MK) dimension. This limit indicates a 50% chance of not bein...
March 15, 2017
Restricted Boltzmann Machines are key tools in Machine Learning and are described by the energy function of bipartite spin-glasses. From a statistical mechanical perspective, they share the same Gibbs measure of Hopfield networks for associative memory. In this equivalence, weights in the former play as patterns in the latter. As Boltzmann machines usually require real weights to be trained with gradient descent like methods, while Hopfield networks typically store binary pat...
May 26, 2003
The time evolution of an exactly solvable layered feedforward neural network with three-state neurons and optimizing the mutual information is studied for arbitrary synaptic noise (temperature). Detailed stationary temperature-capacity and capacity-activity phase diagrams are obtained. The model exhibits pattern retrieval, pattern-fluctuation retrieval and spin-glass phases. It is found that there is an improved performance in the form of both a larger critical capacity and i...
January 29, 2002
The article is a lightly edited version of my habilitation thesis at the University Wuerzburg. My aim is to give a self contained, if concise, introduction to the formal methods used when off-line learning in feedforward networks is analyzed by statistical physics. However, due to its origin, the article is not a comprehensive review of the field but is highly skewed towards reporting my own research.
May 10, 2023
The $\textit{von Neumann Computer Architecture}$ has a distinction between computation and memory. In contrast, the brain has an integrated architecture where computation and memory are indistinguishable. Motivated by the architecture of the brain, we propose a model of $\textit{associative computation}$ where memory is defined by a set of vectors in $\mathbb{R}^n$ (that we call $\textit{anchors}$), computation is performed by convergence from an input vector to a nearest nei...
March 16, 2022
Balancing model complexity against the information contained in observed data is the central challenge to learning. In order for complexity-efficient models to exist and be discoverable in high dimensions, we require a computational framework that relates a credible notion of complexity to simple parameter representations. Further, this framework must allow excess complexity to be gradually removed via gradient-based optimization. Our n-ary, or n-argument, activation function...
April 11, 1997
The cooperative behaviour of interacting neurons and synapses is studied using models and methods from statistical physics. The competition between training error and entropy may lead to discontinuous properties of the neural network. This is demonstrated for a few examples: Perceptron, associative memory, learning from examples, generalization, multilayer networks, structure recognition, Bayesian estimate, on-line training, noise estimation and time series generation.
October 2, 2021
Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered sys...