ID: 1708.01422

Exploring the Function Space of Deep-Learning Machines

August 4, 2017

View on ArXiv
Bo Li, David Saad
Condensed Matter
Computer Science
Disordered Systems and Neura...
Machine Learning

The function space of deep-learning machines is investigated by studying growth in the entropy of functions of a given error with respect to a reference function, realized by a deep-learning machine. Using physics-inspired methods we study both sparsely and densely-connected architectures to discover a layer-wise convergence of candidate functions, marked by a corresponding reduction in entropy when approaching the reference function, gain insight into the importance of having a large number of layers, and observe phase transitions as the error increases.

Similar papers 1

The Modern Mathematics of Deep Learning

May 9, 2021

88% Match
Julius Berner, Philipp Grohs, ... , Petersen Philipp
Machine Learning
Machine Learning

We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the pro...

Find SimilarView on arXiv
Alexander Mozeika, Bo Li, David Saad
Machine Learning
Disordered Systems and Neura...
Machine Learning

We study the space of functions computed by random-layered machines, including deep neural networks and Boolean circuits. Investigating the distribution of Boolean functions computed on the recurrent and layer-dependent architectures, we find that it is the same in both models. Depending on the initial conditions and computing elements used, we characterize the space of functions computed at the large depth limit and show that the macroscopic entropy of Boolean functions is e...

Learning through atypical "phase transitions" in overparameterized neural networks

October 2, 2021

88% Match
Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba Pacelli, ... , Zecchina Riccardo
Machine Learning
Disordered Systems and Neura...
Probability
Machine Learning

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered sys...

Find SimilarView on arXiv

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

August 29, 2023

88% Match
Mackenzie J. Meni, Ryan T. White, ... , Pilkiewicz Kevin
Computer Vision and Pattern ...
Machine Learning
Image and Video Processing

Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are uncertain processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data, and introduce ...

Find SimilarView on arXiv

Exact Phase Transitions in Deep Learning

May 25, 2022

88% Match
Liu Ziyin, Masahito Ueda
Machine Learning
Disordered Systems and Neura...
Applied Physics

This work reports deep-learning-unique first-order and second-order phase transitions, whose phenomenology closely follows that in statistical physics. In particular, we prove that the competition between prediction error and model complexity in the training loss leads to the second-order phase transition for nets with one hidden layer and the first-order phase transition for nets with more than one hidden layer. The proposed theory is directly relevant to the optimization of...

Find SimilarView on arXiv

Deep vs. shallow networks : An approximation theory perspective

August 10, 2016

88% Match
Hrushikesh Mhaskar, Tomaso Poggio
Machine Learning
Functional Analysis

The paper briefy reviews several recent results on hierarchical architectures for learning from examples, that may formally explain the conditions under which Deep Convolutional Neural Networks perform much better in function approximation problems than shallow, one-hidden layer architectures. The paper announces new results for a non-smooth activation function - the ReLU function - used in present-day neural networks, as well as for the Gaussian networks. We propose a new de...

Find SimilarView on arXiv

A Study of the Mathematics of Deep Learning

April 28, 2021

88% Match
Anirbit Mukherjee
Machine Learning
Optimization and Control
Applications
Machine Learning

"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This dramatic success of deep learning in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be a serious mathematical challenge to be able to rigorously explain them. In this thesis, submitted to the Department of Applied Mathematics and Statistics, Johns Hopkins University we take se...

Find SimilarView on arXiv

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

December 30, 2020

87% Match
Mario Geiger, Leonardo Petrini, Matthieu Wyart
Machine Learning

Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental cha...

Find SimilarView on arXiv

The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence

February 12, 2020

87% Match
Terrence J. Sejnowski
Neurons and Cognition
Artificial Intelligence
Machine Learning
Neural and Evolutionary Comp...

Deep learning networks have been trained to recognize speech, caption photographs and translate text between languages at high levels of performance. Although applications of deep learning networks to real world problems have become ubiquitous, our understanding of why they are so effective is lacking. These empirical results should not be possible according to sample complexity in statistics and non-convex optimization theory. However, paradoxes in the training and effective...

Find SimilarView on arXiv

Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning

March 5, 2018

87% Match
Yao Zhang, Andrew M. Saxe, ... , Lee Alpha A.
Machine Learning
Statistical Mechanics
Machine Learning

Finding parameters that minimise a loss function is at the core of many machine learning methods. The Stochastic Gradient Descent algorithm is widely used and delivers state of the art results for many problems. Nonetheless, Stochastic Gradient Descent typically cannot find the global minimum, thus its empirical effectiveness is hitherto mysterious. We derive a correspondence between parameter inference and free energy minimisation in statistical physics. The degree of unders...

Find SimilarView on arXiv