Storage Capacity of the Tilinglike Learn...

Optimal storage capacity of neural networks at finite temperatures

June 15, 1993

82% Match

G. M. Department of Physics and Center for Theoretical Physics Seoul National University, Seoul 151-742, Korea Shimi, D. Department of Physics and Center for Theoretical Physics Seoul National University, Seoul 151-742, Korea Kim, M. Y. Department of Physics and Center for Theoretical Physics Seoul National University, Seoul 151-742, Korea Choi

Condensed Matter

Gardner's analysis of the optimal storage capacity of neural networks is extended to study finite-temperature effects. The typical volume of the space of interactions is calculated for strongly-diluted networks as a function of the storage ratio $\alpha$, temperature $T$, and the tolerance parameter $m$, from which the optimal storage capacity $\alpha_c$ is obtained as a function of $T$ and $m$. At zero temperature it is found that $\alpha_c = 2$ regardless of $m$ while $\alp...

Find SimilarView on arXiv

A Practical Approach to Sizing Neural Networks

October 4, 2018

82% Match

Gerald Friedland, Alfredo Metere, Mario Krell

Neural and Evolutionary Comp...

Artificial Intelligence

Machine Learning

Memorization is worst-case generalization. Based on MacKay's information theoretic model of supervised machine learning, this article discusses how to practically estimate the maximum size of a neural network given a training data set. First, we present four easily applicable rules to analytically determine the capacity of neural network architectures. This allows the comparison of the efficiency of different network architectures independently of a task. Second, we introduce...

Find SimilarView on arXiv

On-line learning through simple perceptron with a margin

June 5, 2003

82% Match

Kazuyuki Hara, Masato Okada

Disordered Systems and Neura...

We analyze a learning method that uses a margin $\kappa$ {\it a la} Gardner for simple perceptron learning. This method corresponds to the perceptron learning when $\kappa=0$, and to the Hebbian learning when $\kappa \to \infty$. Nevertheless, we found that the generalization ability of the method was superior to that of the perceptron and the Hebbian methods at an early stage of learning. We analyzed the asymptotic property of the learning curve of this method through comput...

Find SimilarView on arXiv

A Capacity Scaling Law for Artificial Neural Networks

August 20, 2017

82% Match

Gerald Friedland, Mario Krell

Neural and Evolutionary Comp...

Machine Learning

We derive the calculation of two critical numbers predicting the behavior of perceptron networks. First, we derive the calculation of what we call the lossless memory (LM) dimension. The LM dimension is a generalization of the Vapnik--Chervonenkis (VC) dimension that avoids structured data and therefore provides an upper bound for perfectly fitting almost any training data. Second, we derive what we call the MacKay (MK) dimension. This limit indicates a 50% chance of not bein...

Find SimilarView on arXiv

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

July 18, 2022

82% Match

Boaz Barak, Benjamin L. Edelman, Surbhi Goel, Sham Kakade, ... , Zhang Cyril

Machine Learning

Neural and Evolutionary Comp...

Optimization and Control

Machine Learning

There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times. While there are some accounts of how these resources modulate statistical capacity, far less is known about their effect on the computational problem of model training. This work conducts such an exploration through the lens of learning a $k$-sparse parity of $n$ bits, a canonical discrete search problem which is statistically...

Find SimilarView on arXiv

Learning High-Degree Parities: The Crucial Role of the Initialization

December 6, 2024

81% Match

Emmanuel Abbe, Elisabetta Cornacchia, ... , Kougang-Yombi Donald

Machine Learning

Parities have become a standard benchmark for evaluating learning algorithms. Recent works show that regular neural networks trained by gradient descent can efficiently learn degree $k$ parities on uniform inputs for constant $k$, but fail to do so when $k$ and $d-k$ grow with $d$ (here $d$ is the ambient dimension). However, the case where $k=d-O_d(1)$ (almost-full parities), including the degree $d$ parity (the full parity), has remained unsettled. This paper shows that for...

Find SimilarView on arXiv

Exponential Capacity in an Autoencoder Neural Network with a Hidden Layer

May 21, 2017

81% Match

Alireza Alemi, Alia Abbara

Neurons and Cognition

Disordered Systems and Neura...

A fundamental aspect of limitations in learning any computation in neural architectures is characterizing their optimal capacities. An important, widely-used neural architecture is known as autoencoders where the network reconstructs the input at the output layer via a representation at a hidden layer. Even though capacities of several neural architectures have been addressed using statistical physics methods, the capacity of autoencoder neural networks is not well-explor...

Find SimilarView on arXiv

Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity Learning

February 16, 2016

81% Match

Ran Raz

Machine Learning

Computational Complexity

Cryptography and Security

We prove that any algorithm for learning parities requires either a memory of quadratic size or an exponential number of samples. This proves a recent conjecture of Steinhardt, Valiant and Wager and shows that for some learning problems a large storage space is crucial. More formally, in the problem of parity learning, an unknown string $x \in \{0,1\}^n$ was chosen uniformly at random. A learner tries to learn $x$ from a stream of samples $(a_1, b_1), (a_2, b_2) \ldots$, wh...

Find SimilarView on arXiv

Learning by dilution in a Neural Network

November 18, 1996

81% Match

B. Lopez, W. Kinzel

Disordered Systems and Neura...

A perceptron with N random weights can store of the order of N patterns by removing a fraction of the weights without changing their strengths. The critical storage capacity as a function of the concentration of the remaining bonds for random outputs and for outputs given by a teacher perceptron is calculated. A simple Hebb-like dilution algorithm is presented which in the teacher case reaches the optimal generalization ability.

Find SimilarView on arXiv

Perceptron capacity revisited: classification ability for correlated patterns

December 25, 2007

81% Match

Takashi Shinzato, Yoshiyuki Kabashima

Disordered Systems and Neura...

Statistical Mechanics

In this paper, we address the problem of how many randomly labeled patterns can be correctly classified by a single-layer perceptron when the patterns are correlated with each other. In order to solve this problem, two analytical schemes are developed based on the replica method and Thouless-Anderson-Palmer (TAP) approach by utilizing an integral formula concerning random rectangular matrices. The validity and relevance of the developed methodologies are shown for one known r...

Find SimilarView on arXiv

Storage Capacity of the Tilinglike Learning Algorithm

Optimal storage capacity of neural networks at finite temperatures

A Practical Approach to Sizing Neural Networks

On-line learning through simple perceptron with a margin

A Capacity Scaling Law for Artificial Neural Networks

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

Learning High-Degree Parities: The Crucial Role of the Initialization

Exponential Capacity in an Autoencoder Neural Network with a Hidden Layer

Fast Learning Requires Good Memory: A Time-Space Lower Bound for Parity Learning

Learning by dilution in a Neural Network

Perceptron capacity revisited: classification ability for correlated patterns