Exploring the Function Space of Deep-Learning Machines

August 4, 2017

Bo Li, David Saad

Condensed Matter

Computer Science

Disordered Systems and Neura...

Machine Learning

The function space of deep-learning machines is investigated by studying growth in the entropy of functions of a given error with respect to a reference function, realized by a deep-learning machine. Using physics-inspired methods we study both sparsely and densely-connected architectures to discover a layer-wise convergence of candidate functions, marked by a corresponding reduction in entropy when approaching the reference function, gain insight into the importance of having a large number of layers, and observe phase transitions as the error increases.

The Modern Mathematics of Deep Learning

May 9, 2021

88% Match

Julius Berner, Philipp Grohs, ... , Petersen Philipp

Machine Learning

We describe the new field of mathematical analysis of deep learning. This field emerged around a list of research questions that were not answered within the classical framework of learning theory. These questions concern: the outstanding generalization power of overparametrized neural networks, the role of depth in deep architectures, the apparent absence of the curse of dimensionality, the surprisingly successful optimization performance despite the non-convexity of the pro...

Find SimilarView on arXiv

Space of Functions Computed by Deep-Layered Machines

April 19, 2020

88% Match

Alexander Mozeika, Bo Li, David Saad

Machine Learning

Disordered Systems and Neura...

Machine Learning

We study the space of functions computed by random-layered machines, including deep neural networks and Boolean circuits. Investigating the distribution of Boolean functions computed on the recurrent and layer-dependent architectures, we find that it is the same in both models. Depending on the initial conditions and computing elements used, we characterize the space of functions computed at the large depth limit and show that the macroscopic entropy of Boolean functions is e...

Find Similar View on arXiv

Learning through atypical "phase transitions" in overparameterized neural networks

October 2, 2021

88% Match

Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba Pacelli, ... , Zecchina Riccardo

Machine Learning

Disordered Systems and Neura...

Probability

Machine Learning

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered sys...

Find SimilarView on arXiv

Entropy-based Guidance of Deep Neural Networks for Accelerated Convergence and Improved Performance

August 29, 2023

88% Match

Mackenzie J. Meni, Ryan T. White, ... , Pilkiewicz Kevin

Computer Vision and Pattern ...

Machine Learning

Image and Video Processing

Neural networks have dramatically increased our capacity to learn from large, high-dimensional datasets across innumerable disciplines. However, their decisions are not easily interpretable, their computational costs are high, and building and training them are uncertain processes. To add structure to these efforts, we derive new mathematical results to efficiently measure the changes in entropy as fully-connected and convolutional neural networks process data, and introduce ...

Find SimilarView on arXiv

Exact Phase Transitions in Deep Learning

May 25, 2022

88% Match

Liu Ziyin, Masahito Ueda

Machine Learning

Disordered Systems and Neura...

Applied Physics

This work reports deep-learning-unique first-order and second-order phase transitions, whose phenomenology closely follows that in statistical physics. In particular, we prove that the competition between prediction error and model complexity in the training loss leads to the second-order phase transition for nets with one hidden layer and the first-order phase transition for nets with more than one hidden layer. The proposed theory is directly relevant to the optimization of...

Find SimilarView on arXiv

Deep vs. shallow networks : An approximation theory perspective

August 10, 2016

88% Match

Hrushikesh Mhaskar, Tomaso Poggio

Machine Learning

Functional Analysis

The paper briefy reviews several recent results on hierarchical architectures for learning from examples, that may formally explain the conditions under which Deep Convolutional Neural Networks perform much better in function approximation problems than shallow, one-hidden layer architectures. The paper announces new results for a non-smooth activation function - the ReLU function - used in present-day neural networks, as well as for the Gaussian networks. We propose a new de...

Find SimilarView on arXiv

A Study of the Mathematics of Deep Learning

April 28, 2021

88% Match

Anirbit Mukherjee

Machine Learning

Optimization and Control

Applications

Machine Learning

"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This dramatic success of deep learning in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be a serious mathematical challenge to be able to rigorously explain them. In this thesis, submitted to the Department of Applied Mathematics and Statistics, Johns Hopkins University we take se...

Find SimilarView on arXiv

Perspective: A Phase Diagram for Deep Learning unifying Jamming, Feature Learning and Lazy Training

December 30, 2020

87% Match

Mario Geiger, Leonardo Petrini, Matthieu Wyart

Machine Learning

Deep learning algorithms are responsible for a technological revolution in a variety of tasks including image recognition or Go playing. Yet, why they work is not understood. Ultimately, they manage to classify data lying in high dimension -- a feat generically impossible due to the geometry of high dimensional space and the associated curse of dimensionality. Understanding what kind of structure, symmetry or invariance makes data such as images learnable is a fundamental cha...

Find SimilarView on arXiv

The Unreasonable Effectiveness of Deep Learning in Artificial Intelligence

February 12, 2020

87% Match

Terrence J. Sejnowski

Neurons and Cognition

Artificial Intelligence

Machine Learning

Neural and Evolutionary Comp...

Deep learning networks have been trained to recognize speech, caption photographs and translate text between languages at high levels of performance. Although applications of deep learning networks to real world problems have become ubiquitous, our understanding of why they are so effective is lacking. These empirical results should not be possible according to sample complexity in statistics and non-convex optimization theory. However, paradoxes in the training and effective...

Find SimilarView on arXiv

Energy-entropy competition and the effectiveness of stochastic gradient descent in machine learning

March 5, 2018

87% Match

Yao Zhang, Andrew M. Saxe, ... , Lee Alpha A.

Machine Learning

Statistical Mechanics

Machine Learning

Finding parameters that minimise a loss function is at the core of many machine learning methods. The Stochastic Gradient Descent algorithm is widely used and delivers state of the art results for many problems. Nonetheless, Stochastic Gradient Descent typically cannot find the global minimum, thus its empirical effectiveness is hitherto mysterious. We derive a correspondence between parameter inference and free energy minimisation in statistical physics. The degree of unders...

Find SimilarView on arXiv