From complex to simple : hierarchical free-energy landscape renormalized in deep neural networks

October 22, 2019

View on ArXiv

Hajime Yoshino

Condensed Matter

Computer Science

Statistics

Disordered Systems and Neura...

Statistical Mechanics

Machine Learning

We develop a statistical mechanical approach based on the replica method to study the design space of deep and wide neural networks constrained to meet a large number of training data. Specifically, we analyze the configuration space of the synaptic weights and neurons in the hidden layers in a simple feed-forward perceptron network for two scenarios: a setting with random inputs/outputs and a teacher-student setting. By increasing the strength of constraints,~i.e. increasing the number of training data, successive 2nd order glass transition (random inputs/outputs) or 2nd order crystalline transition (teacher-student setting) take place layer-by-layer starting next to the inputs/outputs boundaries going deeper into the bulk with the thickness of the solid phase growing logarithmically with the data size. This implies the typical storage capacity of the network grows exponentially fast with the depth. In a deep enough network, the central part remains in the liquid phase. We argue that in systems of finite width N, the weak bias field can remain in the center and plays the role of a symmetry-breaking field that connects the opposite sides of the system. The successive glass transitions bring about a hierarchical free-energy landscape with ultrametricity, which evolves in space: it is most complex close to the boundaries but becomes renormalized into progressively simpler ones in deeper layers. These observations provide clues to understand why deep neural networks operate efficiently. Finally, we present some numerical simulations of learning which reveal spatially heterogeneous glassy dynamics truncated by a finite width $N$ effect.

Comparing Dynamics: Deep Neural Networks versus Glassy Systems

March 19, 2018

92% Match

M. Baity-Jesi, L. Sagun, M. Geiger, S. Spigler, G. Ben Arous, C. Cammarota, Y. LeCun, ... , Biroli G.

Machine Learning

Disordered Systems and Neura...

Machine Learning

We analyze numerically the training dynamics of deep neural networks (DNN) by using methods developed in statistical physics of glassy systems. The two main issues we address are (1) the complexity of the loss landscape and of the dynamics within it, and (2) to what extent DNNs share similarities with glassy systems. Our findings, obtained for different architectures and datasets, suggest that during the training process the dynamics slows down because of an increasingly larg...

Find SimilarView on arXiv

Glassy dynamics in deep neural networks: A structural comparison

May 21, 2024

91% Match

Max Kerr Winter, Liesbeth M. C. Janssen

Computational Physics

Disordered Systems and Neura...

Statistical Mechanics

Deep Neural Networks (DNNs) share important similarities with structural glasses. Both have many degrees of freedom, and their dynamics are governed by a high-dimensional, non-convex landscape representing either the loss or energy, respectively. Furthermore, both experience gradient descent dynamics subject to noise. In this work we investigate, by performing quantitative measurements on realistic networks trained on the MNIST and CIFAR-10 datasets, the extent to which this ...

Find SimilarView on arXiv

Statistical physics and practical training of soft-committee machines

December 11, 1998

91% Match

Martin Ahr, Michael Biehl, Robert Urbanczik

Disordered Systems and Neura...

Statistical Mechanics

Equilibrium states of large layered neural networks with differentiable activation function and a single, linear output unit are investigated using the replica formalism. The quenched free energy of a student network with a very large number of hidden units learning a rule of perfectly matching complexity is calculated analytically. The system undergoes a first order phase transition from unspecialized to specialized student configurations at a critical size of the training s...

Find SimilarView on arXiv

Learning through atypical "phase transitions" in overparameterized neural networks

October 2, 2021

91% Match

Carlo Baldassi, Clarissa Lauditi, Enrico M. Malatesta, Rosalba Pacelli, ... , Zecchina Riccardo

Machine Learning

Disordered Systems and Neura...

Probability

Machine Learning

Current deep neural networks are highly overparameterized (up to billions of connection weights) and nonlinear. Yet they can fit data almost perfectly through variants of gradient descent algorithms and achieve unexpected levels of prediction accuracy without overfitting. These are formidable results that defy predictions of statistical learning and pose conceptual challenges for non-convex optimization. In this paper, we use methods from statistical physics of disordered sys...

Find SimilarView on arXiv

Exploring Loss Landscapes through the Lens of Spin Glass Theory

July 30, 2024

91% Match

Hao Liao, Wei Zhang, Zhanyi Huang, Zexiao Long, Mingyang Zhou, Xiaoqun Wu, ... , Yeung Chi Ho

Disordered Systems and Neura...

Artificial Intelligence

In the past decade, significant strides in deep learning have led to numerous groundbreaking applications. Despite these advancements, the understanding of the high generalizability of deep learning, especially in such an over-parametrized space, remains limited. Successful applications are often considered as empirical rather than scientific achievements. For instance, deep neural networks' (DNNs) internal representations, decision-making mechanism, absence of overfitting in...

Find SimilarView on arXiv

Data-driven effective model shows a liquid-like deep learning

July 16, 2020

90% Match

Wenxuan Zou, Haiping Huang

Machine Learning

Disordered Systems and Neura...

Statistical Mechanics

Machine Learning

The geometric structure of an optimization landscape is argued to be fundamentally important to support the success of deep neural network learning. A direct computation of the landscape beyond two layers is hard. Therefore, to capture the global view of the landscape, an interpretable model of the network-parameter (or weight) space must be established. However, the model is lacking so far. Furthermore, it remains unknown what the landscape looks like for deep networks of bi...

Find SimilarView on arXiv

Dense Hebbian neural networks: a replica symmetric picture of supervised learning

November 25, 2022

90% Match

Elena Agliari, Linda Albanese, Francesco Alemanno, Andrea Alessandrelli, Adriano Barra, Fosca Giannotti, ... , Pedreschi Dino

Disordered Systems and Neura...

Machine Learning

We consider dense, associative neural-networks trained by a teacher (i.e., with supervision) and we investigate their computational capabilities analytically, via statistical-mechanics of spin glasses, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as quality and quantity of the training dataset, network storage and noise, that is valid in the limit of large netw...

Find SimilarView on arXiv

Replica symmetry breaking in dense neural networks

November 25, 2021

90% Match

Linda Albanese, Francesco Alemanno, ... , Barra Adriano

Disordered Systems and Neura...

Mathematical Physics

Understanding the glassy nature of neural networks is pivotal both for theoretical and computational advances in Machine Learning and Theoretical Artificial Intelligence. Keeping the focus on dense associative Hebbian neural networks, the purpose of this paper is two-fold: at first we develop rigorous mathematical approaches to address properly a statistical mechanical picture of the phenomenon of {\em replica symmetry breaking} (RSB) in these networks, then -- deepening resu...

Find SimilarView on arXiv

High-dimensional manifold of solutions in neural networks: insights from statistical physics

September 17, 2023

90% Match

Enrico M. Malatesta

Disordered Systems and Neura...

Machine Learning

Probability

Statistics Theory

In these pedagogic notes I review the statistical mechanics approach to neural networks, focusing on the paradigmatic example of the perceptron architecture with binary an continuous weights, in the classification setting. I will review the Gardner's approach based on replica method and the derivation of the SAT/UNSAT transition in the storage setting. Then, I discuss some recent works that unveiled how the zero training error configurations are geometrically arranged, and ho...

Find SimilarView on arXiv

Neural networks: from the perceptron to deep nets

April 13, 2023

90% Match

Marylou Gabrié, Surya Ganguli, ... , Zecchina Riccardo

Disordered Systems and Neura...

Statistical Mechanics

Artificial networks have been studied through the prism of statistical mechanics as disordered systems since the 80s, starting from the simple models of Hopfield's associative memory and the single-neuron perceptron classifier. Assuming data is generated by a teacher model, asymptotic generalisation predictions were originally derived using the replica method and the online learning dynamics has been described in the large system limit. In this chapter, we review the key orig...

Find SimilarView on arXiv