ID: 1909.11522

Neural networks are a priori biased towards Boolean functions with low entropy

September 25, 2019

View on ArXiv
Chris Mingard, Joar Skalse, Guillermo Valle-Pérez, David Martínez-Rubio, Vladimir Mikulik, Ard A. Louis
Computer Science
Statistics
Machine Learning
Machine Learning

Understanding the inductive bias of neural networks is critical to explaining their ability to generalise. Here, for one of the simplest neural networks -- a single-layer perceptron with n input neurons, one output neuron, and no threshold bias term -- we prove that upon random initialisation of weights, the a priori probability $P(t)$ that it represents a Boolean function that classifies t points in ${0,1}^n$ as 1 has a remarkably simple form: $P(t) = 2^{-n}$ for $0\leq t < 2^n$. Since a perceptron can express far fewer Boolean functions with small or large values of t (low entropy) than with intermediate values of t (high entropy) there is, on average, a strong intrinsic a-priori bias towards individual functions with low entropy. Furthermore, within a class of functions with fixed t, we often observe a further intrinsic bias towards functions of lower complexity. Finally, we prove that, regardless of the distribution of inputs, the bias towards low entropy becomes monotonically stronger upon adding ReLU layers, and empirically show that increasing the variance of the bias term has a similar effect.

Similar papers 1

Random deep neural networks are biased towards simple functions

December 25, 2018

88% Match
Palma Giacomo De, Bobak Toussi Kiani, Seth Lloyd
Machine Learning
Disordered Systems and Neura...
Machine Learning
Mathematical Physics

We prove that the binary classifiers of bit strings generated by random wide deep neural networks with ReLU activation function are biased towards simple functions. The simplicity is captured by the following two properties. For any given input bit string, the average Hamming distance of the closest input bit string with a different classification is at least sqrt(n / (2{\pi} log n)), where n is the length of the string. Moreover, if the bits of the initial string are flipped...

Find SimilarView on arXiv

Stably unactivated neurons in ReLU neural networks

December 6, 2024

87% Match
Natalie Brownlowe, Christopher R. Cornwell, Ethan Montes, Gabriel Quijano, ... , Zhang Na
Machine Learning
Probability
Machine Learning

The choice of architecture of a neural network influences which functions will be realizable by that neural network and, as a result, studying the expressiveness of a chosen architecture has received much attention. In ReLU neural networks, the presence of stably unactivated neurons can reduce the network's expressiveness. In this work, we investigate the probability of a neuron in the second hidden layer of such neural networks being stably unactivated when the weights and b...

Find SimilarView on arXiv
Alexander Mozeika, Bo Li, David Saad
Machine Learning
Disordered Systems and Neura...
Machine Learning

We study the space of functions computed by random-layered machines, including deep neural networks and Boolean circuits. Investigating the distribution of Boolean functions computed on the recurrent and layer-dependent architectures, we find that it is the same in both models. Depending on the initial conditions and computing elements used, we characterize the space of functions computed at the large depth limit and show that the macroscopic entropy of Boolean functions is e...

Lower bounds over Boolean inputs for deep neural networks with ReLU gates

November 8, 2017

85% Match
Anirbit Mukherjee, Amitabh Basu
Computational Complexity
Machine Learning
Neural and Evolutionary Comp...
Machine Learning

Motivated by the resurgence of neural networks in being able to solve complex learning tasks we undertake a study of high depth networks using ReLU gates which implement the function $x \mapsto \max\{0,x\}$. We try to understand the role of depth in such neural networks by showing size lowerbounds against such network architectures in parameter regimes hitherto unexplored. In particular we show the following two main results about neural nets computing Boolean functions of in...

Find SimilarView on arXiv

Exploiting the equivalence between quantum neural networks and perceptrons

July 5, 2024

85% Match
Chris Mingard, Jessica Pointing, Charles London, ... , Louis Ard A.
Artificial Intelligence

Quantum machine learning models based on parametrized quantum circuits, also called quantum neural networks (QNNs), are considered to be among the most promising candidates for applications on near-term quantum devices. Here we explore the expressivity and inductive bias of QNNs by exploiting an exact mapping from QNNs with inputs $x$ to classical perceptrons acting on $x \otimes x$ (generalised to complex inputs). The simplicity of the perceptron architecture allows us to pr...

Find SimilarView on arXiv

Do deep neural networks have an inbuilt Occam's razor?

April 13, 2023

85% Match
Chris Mingard, Henry Rees, ... , Louis Ard A.
Machine Learning
Artificial Intelligence
Machine Learning

The remarkable performance of overparameterized deep neural networks (DNNs) must arise from an interplay between network architecture, training algorithms, and structure in the data. To disentangle these three components, we apply a Bayesian picture, based on the functions expressed by a DNN, to supervised learning. The prior over functions is determined by the network, and is varied by exploiting a transition between ordered and chaotic regimes. For Boolean function classifi...

Find SimilarView on arXiv

Deep learning generalizes because the parameter-function map is biased towards simple functions

May 22, 2018

85% Match
Guillermo Valle-Pérez, Chico Q. Camargo, Ard A. Louis
Machine Learning
Artificial Intelligence
Machine Learning
Neural and Evolutionary Comp...

Deep neural networks (DNNs) generalize remarkably well without explicit regularization even in the strongly over-parametrized regime where classical learning theory would instead predict that they would severely overfit. While many proposals for some kind of implicit regularization have been made to rationalise this success, there is no consensus for the fundamental reason why DNNs do not strongly overfit. In this paper, we provide a new explanation. By applying a very genera...

Find SimilarView on arXiv

The empirical size of trained neural networks

November 29, 2016

84% Match
Kevin K. Chen, Anthony Gamst, Alden Walker
Machine Learning
Machine Learning

ReLU neural networks define piecewise linear functions of their inputs. However, initializing and training a neural network is very different from fitting a linear spline. In this paper, we expand empirically upon previous theoretical work to demonstrate features of trained neural networks. Standard network initialization and training produce networks vastly simpler than a naive parameter count would suggest and can impart odd features to the trained network. However, we also...

Find SimilarView on arXiv

Bayesian Perceptron: Towards fully Bayesian Neural Networks

September 3, 2020

84% Match
Marco F. Huber
Machine Learning
Machine Learning

Artificial neural networks (NNs) have become the de facto standard in machine learning. They allow learning highly nonlinear transformations in a plethora of applications. However, NNs usually only provide point estimates without systematically quantifying corresponding uncertainties. In this paper a novel approach towards fully Bayesian NNs is proposed, where training and predictions of a perceptron are performed within the Bayesian inference framework in closed-form. The we...

Find SimilarView on arXiv

On the Spectral Bias of Neural Networks

June 22, 2018

84% Match
Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, ... , Courville Aaron
Machine Learning
Machine Learning

Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with $100\%$ accuracy. In this work, we present properties of neural networks that complement this aspect of expressivity. By using tools from Fourier analysis, we show that deep ReLU networks are biased towards low frequency functions, meaning that they cannot have local fluctuations without affecting their global behavior. Intuitively, this property is in lin...

Find SimilarView on arXiv