October 22, 1996
We consider an ensemble of $K$ single-layer perceptrons exposed to random inputs and investigate the conditions under which the couplings of these perceptrons can be chosen such that prescribed correlations between the outputs occur. A general formalism is introduced using a multi-perceptron costfunction that allows to determine the maximal number of random inputs as a function of the desired values of the correlations. Replica-symmetric results for $K=2$ and $K=3$ are compar...
June 14, 2018
We study and provide exposition to several phenomena that are related to the perceptron's compression. One theme concerns modifications of the perceptron algorithm that yield better guarantees on the margin of the hyperplane it outputs. These modifications can be useful in training neural networks as well, and we demonstrate them with some experimental data. In a second theme, we deduce conclusions from the perceptron's compression in various contexts.
April 16, 1996
We study the weight space structure of the parity machine with binary weights by deriving the distribution of volumes associated to the internal representations of the learning examples. The learning behaviour and the symmetry breaking transition are analyzed and the results are found to be in very good agreement with extended numerical simulations.
July 5, 1996
A perceptron is trained by a random bit sequence. In comparison to the corresponding classification problem, the storage capacity decreases to alpha_c=1.70\pm 0.02 due to correlations between input and output bits. The numerical results are supported by a signal to noise analysis of Hebbian weights.
January 1, 2025
Learning parity functions is a canonical problem in learning theory, which although computationally tractable, is not amenable to standard learning algorithms such as gradient-based methods. This hardness is usually explained via statistical query lower bounds [Kearns, 1998]. However, these bounds only imply that for any given algorithm, there is some worst-case parity function that will be hard to learn. Thus, they do not explain why fixed parities - say, the full parity fun...
January 7, 2019
In this work, we introduce the concept of bandlimiting into the theory of machine learning because all physical processes are bandlimited by nature, including real-world machine learning tasks. After the bandlimiting constraint is taken into account, our theoretical analysis has shown that all practical machine learning tasks are asymptotically solvable in a perfect sense. Furthermore, the key towards this solvability almost solely relies on two factors: i) a sufficiently lar...
January 2, 2019
A long standing open problem in the theory of neural networks is the development of quantitative methods to estimate and compare the capabilities of different architectures. Here we define the capacity of an architecture by the binary logarithm of the number of functions it can compute, as the synaptic weights are varied. The capacity provides an upper bound on the number of bits that can be extracted from the training data and stored in the architecture during learning. We s...
January 28, 1998
We review the application of Statistical Mechanics methods to the study of online learning of a drifting concept in the limit of large systems. The model where a feed-forward network learns from examples generated by a time dependent teacher of the same architecture is analyzed. The best possible generalization ability is determined exactly, through the use of a variational method. The constructive variational method also suggests a learning algorithm. It depends, however, on...
December 13, 2023
We consider the memorization capabilities of multilayered \emph{sign} perceptrons neural networks (SPNNs). A recent rigorous upper-bounding capacity characterization, obtained in \cite{Stojnictcmspnncaprdt23} utilizing the Random Duality Theory (RDT), demonstrated that adding neurons in a network configuration may indeed be very beneficial. Moreover, for particular \emph{treelike committee machines} (TCM) architectures with $d\leq 5$ neurons in the hidden layer, \cite{Stojnic...
March 3, 1997
We study the on-line AdaTron learning of linearly non-separable rules by a simple perceptron. Training examples are provided by a perceptron with a non-monotonic transfer function which reduces to the usual monotonic relation in a certain limit. We find that, although the on-line AdaTron learning is a powerful algorithm for the learnable rule, it does not give the best possible generalization error for unlearnable problems. Optimization of the learning rate is shown to greatl...