Lower bounds over Boolean inputs for deep neural networks with ReLU gates

November 8, 2017

View on ArXiv

Anirbit Mukherjee, Amitabh Basu

Computer Science

Statistics

Computational Complexity

Machine Learning

Neural and Evolutionary Comp...

Machine Learning

Motivated by the resurgence of neural networks in being able to solve complex learning tasks we undertake a study of high depth networks using ReLU gates which implement the function $x \mapsto \max\{0,x\}$. We try to understand the role of depth in such neural networks by showing size lowerbounds against such network architectures in parameter regimes hitherto unexplored. In particular we show the following two main results about neural nets computing Boolean functions of input dimension $n$, 1. We use the method of random restrictions to show almost linear, $\Omega(\epsilon^{2(1-\delta)}n^{1-\delta})$, lower bound for completely weight unrestricted LTF-of-ReLU circuits to match the Andreev function on at least $\frac{1}{2} +\epsilon$ fraction of the inputs for $\epsilon > \sqrt{2\frac{\log^{\frac {2}{2-\delta}}(n)}{n}}$ for any $\delta \in (0,\frac 1 2)$ 2. We use the method of sign-rank to show exponential in dimension lower bounds for ReLU circuits ending in a LTF gate and of depths upto $O(n^{\xi})$ with $\xi < \frac{1}{8}$ with some restrictions on the weights in the bottom most layer. All other weights in these circuits are kept unrestricted. This in turns also implies the same lowerbounds for LTF circuits with the same architecture and the same weight restrictions on their bottom most layer. Along the way we also show that there exists a $\mathbb{R}^ n\rightarrow \mathbb{R}$ Sum-of-ReLU-of-ReLU function which Sum-of-ReLU neural nets can never represent no matter how large they are allowed to be.

Limits on representing Boolean functions by linear combinations of simple functions: thresholds, ReLUs, and low-degree polynomials

February 26, 2018

89% Match

R. Ryan Williams

Computational Complexity

Discrete Mathematics

Neural and Evolutionary Comp...

We consider the problem of representing Boolean functions exactly by "sparse" linear combinations (over $\mathbb{R}$) of functions from some "simple" class ${\cal C}$. In particular, given ${\cal C}$ we are interested in finding low-complexity functions lacking sparse representations. When ${\cal C}$ is the set of PARITY functions or the set of conjunctions, this sort of problem has a well-understood answer, the problem becomes interesting when ${\cal C}$ is "overcomplete" an...

Find SimilarView on arXiv

Exponential Lower Bounds for Threshold Circuits of Sub-Linear Depth and Energy

July 1, 2021

88% Match

Kei Uchizawa, Haruki Abe

Computational Complexity

Computer Vision and Pattern ...

Machine Learning

Neural and Evolutionary Comp...

In this paper, we investigate computational power of threshold circuits and other theoretical models of neural networks in terms of the following four complexity measures: size (the number of gates), depth, weight and energy. Here the energy complexity of a circuit measures sparsity of their computation, and is defined as the maximum number of gates outputting non-zero values taken over all the input assignments. As our main result, we prove that any threshold circuit $C$ of ...

Find SimilarView on arXiv

Size and Depth Separation in Approximating Benign Functions with Neural Networks

January 30, 2021

88% Match

Gal Vardi, Daniel Reichman, ... , Shamir Ohad

Machine Learning

Computational Complexity

Machine Learning

When studying the expressive power of neural networks, a main challenge is to understand how the size and depth of the network affect its ability to approximate real functions. However, not all functions are interesting from a practical viewpoint: functions of interest usually have a polynomially-bounded Lipschitz constant, and can be computed efficiently. We call functions that satisfy these conditions "benign", and explore the benefits of size and depth for approximation of...

Find SimilarView on arXiv

On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks

February 10, 2018

87% Match

Yukun Ding, Jinglan Liu, ... , Shi Yiyu

Machine Learning

Computer Vision and Pattern ...

Compression is a key step to deploy large neural networks on resource-constrained platforms. As a popular compression technique, quantization constrains the number of distinct weight values and thus reducing the number of bits required to represent and store each weight. In this paper, we study the representation power of quantized neural networks. First, we prove the universal approximability of quantized ReLU networks on a wide class of functions. Then we provide upper boun...

Find SimilarView on arXiv

Understanding Deep Neural Networks with Rectified Linear Units

November 4, 2016

87% Match

Raman Arora, Amitabh Basu, ... , Mukherjee Anirbit

Machine Learning

Disordered Systems and Neura...

Artificial Intelligence

Computational Complexity

Machine Learning

In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give an algorithm to train a ReLU DNN with one hidden layer to *global optimality* with runtime polynomial in the data size albeit exponential in the input dimension. Further, we improve on the known lower bounds on size (from exponential to super exponential) for approximating a ReLU deep net function by a shallower ReLU net. Our gap theorem...

Find SimilarView on arXiv

Towards Lower Bounds on the Depth of ReLU Neural Networks

May 31, 2021

87% Match

Christoph Hertrich, Amitabh Basu, ... , Skutella Martin

Machine Learning

Discrete Mathematics

Neural and Evolutionary Comp...

Combinatorics

Machine Learning

We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly represen...

Find SimilarView on arXiv

Super-Linear Gate and Super-Quadratic Wire Lower Bounds for Depth-Two and Depth-Three Threshold Circuits

November 24, 2015

87% Match

Daniel M. Kane, Ryan Williams

Computational Complexity

Neural and Evolutionary Comp...

In order to formally understand the power of neural computing, we first need to crack the frontier of threshold circuits with two and three layers, a regime that has been surprisingly intractable to analyze. We prove the first super-linear gate lower bounds and the first super-quadratic wire lower bounds for depth-two linear threshold circuits with arbitrary weights, and depth-three majority circuits computing an explicit function. $\bullet$ We prove that for all $\epsilon\...

Find SimilarView on arXiv

Space of Functions Computed by Deep-Layered Machines

April 19, 2020

86% Match

Alexander Mozeika, Bo Li, David Saad

Machine Learning

Disordered Systems and Neura...

Machine Learning

We study the space of functions computed by random-layered machines, including deep neural networks and Boolean circuits. Investigating the distribution of Boolean functions computed on the recurrent and layer-dependent architectures, we find that it is the same in both models. Depending on the initial conditions and computing elements used, we characterize the space of functions computed at the large depth limit and show that the macroscopic entropy of Boolean functions is e...

Find Similar View on arXiv

Lower Bounds on the Depth of Integral ReLU Neural Networks via Lattice Polytopes

February 24, 2023

86% Match

Christian Haase, Christoph Hertrich, Georg Loho

Machine Learning

Discrete Mathematics

Neural and Evolutionary Comp...

Combinatorics

Machine Learning

We prove that the set of functions representable by ReLU neural networks with integer weights strictly increases with the network depth while allowing arbitrary width. More precisely, we show that $\lceil\log_2(n)\rceil$ hidden layers are indeed necessary to compute the maximum of $n$ numbers, matching known upper bounds. Our results are based on the known duality between neural networks and Newton polytopes via tropical geometry. The integrality assumption implies that these...

Find SimilarView on arXiv

Learning Deep ReLU Networks Is Fixed-Parameter Tractable

September 28, 2020

86% Match

Sitan Chen, Adam R. Klivans, Raghu Meka

Machine Learning

Data Structures and Algorith...

Machine Learning

We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. Our bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network (we show...

Find SimilarView on arXiv