August 20, 2017
We review Boltzmann machines and energy-based models. A Boltzmann machine defines a probability distribution over binary-valued patterns. One can learn parameters of a Boltzmann machine via gradient based approaches in a way that log likelihood of data is increased. The gradient and Hessian of a Boltzmann machine admit beautiful mathematical representations, although computing them is in general intractable. This intractability motivates approximate methods, including Gibbs s...
May 24, 2016
Probabilistic models can be defined by an energy function, where the probability of each state is proportional to the exponential of the state's negative energy. This paper considers a generalization of energy-based models in which the probability of a state is proportional to an arbitrary positive, strictly decreasing, and twice differentiable function of the state's energy. The precise shape of the nonlinear map from energies to unnormalized probabilities has to be learned ...
December 9, 2019
Focusing on the grand-canonical extension of the ordinary restricted Boltzmann machine, we suggest an energy-based model for feature extraction that uses a layer of hidden units with varying size. By an appropriate choice of the chemical potential and given a sufficiently large number of hidden resources the generative model is able to efficiently deduce the optimal number of hidden units required to learn the target data with exceedingly small generalization error. The forma...
December 12, 2017
We compare and contrast the statistical physics and quantum physics inspired approaches for unsupervised generative modeling of classical data. The two approaches represent probabilities of observed data using energy-based models and quantum states respectively.Classical and quantum information patterns of the target datasets therefore provide principled guidelines for structural design and learning in these two approaches. Taking the restricted Boltzmann machines (RBM) as an...
February 7, 2020
A theory explaining how deep learning works is yet to be developed. Previous work suggests that deep learning performs a coarse graining, similar in spirit to the renormalization group (RG). This idea has been explored in the setting of a local (nearest neighbor interactions) Ising spin lattice. We extend the discussion to the setting of a long range spin lattice. Markov Chain Monte Carlo (MCMC) simulations determine both the critical temperature and scaling dimensions of the...
November 6, 2017
Deep learning has achieved a great success in many areas, from computer vision to natural language processing, to game playing, and much more. Yet, what deep learning is really doing is still an open question. There are a lot of works in this direction. For example, [5] tried to explain deep learning by group renormalization, and [6] tried to explain deep learning from the view of functional approximation. In order to address this very crucial question, here we see deep learn...
February 26, 2017
Motivated by the idea that criticality and universality of phase transitions might play a crucial role in achieving and sustaining learning and intelligent behaviour in biological and artificial networks, we analyse a theoretical and a pragmatic experimental set up for critical phenomena in deep learning. On the theoretical side, we use results from statistical physics to carry out critical point calculations in feed-forward/fully connected networks, while on the experimental...
May 14, 2022
Restricted Boltzmann machines are energy models made of a visible and a hidden layer. We identify an effective energy function describing the zero-temperature landscape on the visible units and depending only on the tail behaviour of the hidden layer prior distribution. Studying the location of the local minima of such an energy function, we show that the ability of a restricted Boltzmann machine to reconstruct a random pattern depends indeed only on the tail of the hidden pr...
May 11, 2015
We present a layered Boltzmann machine (BM) that can better exploit the advantages of a distributed representation. It is widely believed that deep BMs (DBMs) have far greater representational power than its shallow counterpart, restricted Boltzmann machines (RBMs). However, this expectation on the supremacy of DBMs over RBMs has not ever been validated in a theoretical fashion. In this paper, we provide both theoretical and empirical evidences that the representational power...
April 27, 2017
In this work, we analyze the nonequilibrium thermodynamics of a class of neural networks known as Restricted Boltzmann Machines (RBMs) in the context of unsupervised learning. We show how the network is described as a discrete Markov process and how the detailed balance condition and the Maxwell-Boltzmann equilibrium distribution are sufficient conditions for a complete thermodynamics description, including nonequilibrium fluctuation theorems. Numerical simulations in a fully...