November 8, 2017
Motivated by the resurgence of neural networks in being able to solve complex learning tasks we undertake a study of high depth networks using ReLU gates which implement the function $x \mapsto \max\{0,x\}$. We try to understand the role of depth in such neural networks by showing size lowerbounds against such network architectures in parameter regimes hitherto unexplored. In particular we show the following two main results about neural nets computing Boolean functions of input dimension $n$, 1. We use the method of random restrictions to show almost linear, $\Omega(\epsilon^{2(1-\delta)}n^{1-\delta})$, lower bound for completely weight unrestricted LTF-of-ReLU circuits to match the Andreev function on at least $\frac{1}{2} +\epsilon$ fraction of the inputs for $\epsilon > \sqrt{2\frac{\log^{\frac {2}{2-\delta}}(n)}{n}}$ for any $\delta \in (0,\frac 1 2)$ 2. We use the method of sign-rank to show exponential in dimension lower bounds for ReLU circuits ending in a LTF gate and of depths upto $O(n^{\xi})$ with $\xi < \frac{1}{8}$ with some restrictions on the weights in the bottom most layer. All other weights in these circuits are kept unrestricted. This in turns also implies the same lowerbounds for LTF circuits with the same architecture and the same weight restrictions on their bottom most layer. Along the way we also show that there exists a $\mathbb{R}^ n\rightarrow \mathbb{R}$ Sum-of-ReLU-of-ReLU function which Sum-of-ReLU neural nets can never represent no matter how large they are allowed to be.
Similar papers 1
February 26, 2018
We consider the problem of representing Boolean functions exactly by "sparse" linear combinations (over $\mathbb{R}$) of functions from some "simple" class ${\cal C}$. In particular, given ${\cal C}$ we are interested in finding low-complexity functions lacking sparse representations. When ${\cal C}$ is the set of PARITY functions or the set of conjunctions, this sort of problem has a well-understood answer, the problem becomes interesting when ${\cal C}$ is "overcomplete" an...
July 1, 2021
In this paper, we investigate computational power of threshold circuits and other theoretical models of neural networks in terms of the following four complexity measures: size (the number of gates), depth, weight and energy. Here the energy complexity of a circuit measures sparsity of their computation, and is defined as the maximum number of gates outputting non-zero values taken over all the input assignments. As our main result, we prove that any threshold circuit $C$ of ...
January 30, 2021
When studying the expressive power of neural networks, a main challenge is to understand how the size and depth of the network affect its ability to approximate real functions. However, not all functions are interesting from a practical viewpoint: functions of interest usually have a polynomially-bounded Lipschitz constant, and can be computed efficiently. We call functions that satisfy these conditions "benign", and explore the benefits of size and depth for approximation of...
February 10, 2018
Compression is a key step to deploy large neural networks on resource-constrained platforms. As a popular compression technique, quantization constrains the number of distinct weight values and thus reducing the number of bits required to represent and store each weight. In this paper, we study the representation power of quantized neural networks. First, we prove the universal approximability of quantized ReLU networks on a wide class of functions. Then we provide upper boun...
November 4, 2016
In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give an algorithm to train a ReLU DNN with one hidden layer to *global optimality* with runtime polynomial in the data size albeit exponential in the input dimension. Further, we improve on the known lower bounds on size (from exponential to super exponential) for approximating a ReLU deep net function by a shallower ReLU net. Our gap theorem...
May 31, 2021
We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly represen...
November 24, 2015
In order to formally understand the power of neural computing, we first need to crack the frontier of threshold circuits with two and three layers, a regime that has been surprisingly intractable to analyze. We prove the first super-linear gate lower bounds and the first super-quadratic wire lower bounds for depth-two linear threshold circuits with arbitrary weights, and depth-three majority circuits computing an explicit function. $\bullet$ We prove that for all $\epsilon\...
April 19, 2020
We study the space of functions computed by random-layered machines, including deep neural networks and Boolean circuits. Investigating the distribution of Boolean functions computed on the recurrent and layer-dependent architectures, we find that it is the same in both models. Depending on the initial conditions and computing elements used, we characterize the space of functions computed at the large depth limit and show that the macroscopic entropy of Boolean functions is e...
August 9, 2017
This article concerns the expressive power of depth in neural nets with ReLU activations and bounded width. We are particularly interested in the following questions: what is the minimal width $w_{\text{min}}(d)$ so that ReLU nets of width $w_{\text{min}}(d)$ (and arbitrary depth) can approximate any continuous function on the unit cube $[0,1]^d$ aribitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given functio...
July 2, 2013
Polynomial threshold gates are basic processing units of an artificial neural network. When the input vectors are binary vectors, these gates correspond to Boolean functions and can be analyzed via their polynomial representations. In practical applications, it is desirable to find a polynomial representation with the smallest number of terms possible, in order to use the least possible number of input lines to the unit under consideration. For this purpose, instead of an exa...