ID: 1711.03073

Lower bounds over Boolean inputs for deep neural networks with ReLU gates

November 8, 2017

View on ArXiv
Anirbit Mukherjee, Amitabh Basu
Computer Science
Statistics
Computational Complexity
Machine Learning
Neural and Evolutionary Comp...
Machine Learning

Motivated by the resurgence of neural networks in being able to solve complex learning tasks we undertake a study of high depth networks using ReLU gates which implement the function $x \mapsto \max\{0,x\}$. We try to understand the role of depth in such neural networks by showing size lowerbounds against such network architectures in parameter regimes hitherto unexplored. In particular we show the following two main results about neural nets computing Boolean functions of input dimension $n$, 1. We use the method of random restrictions to show almost linear, $\Omega(\epsilon^{2(1-\delta)}n^{1-\delta})$, lower bound for completely weight unrestricted LTF-of-ReLU circuits to match the Andreev function on at least $\frac{1}{2} +\epsilon$ fraction of the inputs for $\epsilon > \sqrt{2\frac{\log^{\frac {2}{2-\delta}}(n)}{n}}$ for any $\delta \in (0,\frac 1 2)$ 2. We use the method of sign-rank to show exponential in dimension lower bounds for ReLU circuits ending in a LTF gate and of depths upto $O(n^{\xi})$ with $\xi < \frac{1}{8}$ with some restrictions on the weights in the bottom most layer. All other weights in these circuits are kept unrestricted. This in turns also implies the same lowerbounds for LTF circuits with the same architecture and the same weight restrictions on their bottom most layer. Along the way we also show that there exists a $\mathbb{R}^ n\rightarrow \mathbb{R}$ Sum-of-ReLU-of-ReLU function which Sum-of-ReLU neural nets can never represent no matter how large they are allowed to be.

Similar papers 1

Limits on representing Boolean functions by linear combinations of simple functions: thresholds, ReLUs, and low-degree polynomials

February 26, 2018

89% Match
R. Ryan Williams
Computational Complexity
Discrete Mathematics
Neural and Evolutionary Comp...

We consider the problem of representing Boolean functions exactly by "sparse" linear combinations (over $\mathbb{R}$) of functions from some "simple" class ${\cal C}$. In particular, given ${\cal C}$ we are interested in finding low-complexity functions lacking sparse representations. When ${\cal C}$ is the set of PARITY functions or the set of conjunctions, this sort of problem has a well-understood answer, the problem becomes interesting when ${\cal C}$ is "overcomplete" an...

Find SimilarView on arXiv

Exponential Lower Bounds for Threshold Circuits of Sub-Linear Depth and Energy

July 1, 2021

88% Match
Kei Uchizawa, Haruki Abe
Computational Complexity
Computer Vision and Pattern ...
Machine Learning
Neural and Evolutionary Comp...

In this paper, we investigate computational power of threshold circuits and other theoretical models of neural networks in terms of the following four complexity measures: size (the number of gates), depth, weight and energy. Here the energy complexity of a circuit measures sparsity of their computation, and is defined as the maximum number of gates outputting non-zero values taken over all the input assignments. As our main result, we prove that any threshold circuit $C$ of ...

Find SimilarView on arXiv

Size and Depth Separation in Approximating Benign Functions with Neural Networks

January 30, 2021

88% Match
Gal Vardi, Daniel Reichman, ... , Shamir Ohad
Machine Learning
Computational Complexity
Machine Learning

When studying the expressive power of neural networks, a main challenge is to understand how the size and depth of the network affect its ability to approximate real functions. However, not all functions are interesting from a practical viewpoint: functions of interest usually have a polynomially-bounded Lipschitz constant, and can be computed efficiently. We call functions that satisfy these conditions "benign", and explore the benefits of size and depth for approximation of...

Find SimilarView on arXiv

On the Universal Approximability and Complexity Bounds of Quantized ReLU Neural Networks

February 10, 2018

87% Match
Yukun Ding, Jinglan Liu, ... , Shi Yiyu
Machine Learning
Computer Vision and Pattern ...

Compression is a key step to deploy large neural networks on resource-constrained platforms. As a popular compression technique, quantization constrains the number of distinct weight values and thus reducing the number of bits required to represent and store each weight. In this paper, we study the representation power of quantized neural networks. First, we prove the universal approximability of quantized ReLU networks on a wide class of functions. Then we provide upper boun...

Find SimilarView on arXiv

Understanding Deep Neural Networks with Rectified Linear Units

November 4, 2016

87% Match
Raman Arora, Amitabh Basu, ... , Mukherjee Anirbit
Machine Learning
Disordered Systems and Neura...
Artificial Intelligence
Computational Complexity
Machine Learning

In this paper we investigate the family of functions representable by deep neural networks (DNN) with rectified linear units (ReLU). We give an algorithm to train a ReLU DNN with one hidden layer to *global optimality* with runtime polynomial in the data size albeit exponential in the input dimension. Further, we improve on the known lower bounds on size (from exponential to super exponential) for approximating a ReLU deep net function by a shallower ReLU net. Our gap theorem...

Find SimilarView on arXiv

Towards Lower Bounds on the Depth of ReLU Neural Networks

May 31, 2021

87% Match
Christoph Hertrich, Amitabh Basu, ... , Skutella Martin
Machine Learning
Discrete Mathematics
Neural and Evolutionary Comp...
Combinatorics
Machine Learning

We contribute to a better understanding of the class of functions that can be represented by a neural network with ReLU activations and a given architecture. Using techniques from mixed-integer optimization, polyhedral theory, and tropical geometry, we provide a mathematical counterbalance to the universal approximation theorems which suggest that a single hidden layer is sufficient for learning any function. In particular, we investigate whether the class of exactly represen...

Find SimilarView on arXiv

Super-Linear Gate and Super-Quadratic Wire Lower Bounds for Depth-Two and Depth-Three Threshold Circuits

November 24, 2015

87% Match
Daniel M. Kane, Ryan Williams
Computational Complexity
Neural and Evolutionary Comp...

In order to formally understand the power of neural computing, we first need to crack the frontier of threshold circuits with two and three layers, a regime that has been surprisingly intractable to analyze. We prove the first super-linear gate lower bounds and the first super-quadratic wire lower bounds for depth-two linear threshold circuits with arbitrary weights, and depth-three majority circuits computing an explicit function. $\bullet$ We prove that for all $\epsilon\...

Find SimilarView on arXiv
Alexander Mozeika, Bo Li, David Saad
Machine Learning
Disordered Systems and Neura...
Machine Learning

We study the space of functions computed by random-layered machines, including deep neural networks and Boolean circuits. Investigating the distribution of Boolean functions computed on the recurrent and layer-dependent architectures, we find that it is the same in both models. Depending on the initial conditions and computing elements used, we characterize the space of functions computed at the large depth limit and show that the macroscopic entropy of Boolean functions is e...

Universal Function Approximation by Deep Neural Nets with Bounded Width and ReLU Activations

August 9, 2017

86% Match
Boris Hanin
stat.ML
cs.CG
cs.LG
math.FA
math.ST
stat.TH

This article concerns the expressive power of depth in neural nets with ReLU activations and bounded width. We are particularly interested in the following questions: what is the minimal width $w_{\text{min}}(d)$ so that ReLU nets of width $w_{\text{min}}(d)$ (and arbitrary depth) can approximate any continuous function on the unit cube $[0,1]^d$ aribitrarily well? For ReLU nets near this minimal width, what can one say about the depth necessary to approximate a given functio...

Find SimilarView on arXiv

Representing Boolean Functions Using Polynomials: More Can Offer Less

July 2, 2013

86% Match
Yi Ming Zou
Computational Complexity
Combinatorics
Number Theory

Polynomial threshold gates are basic processing units of an artificial neural network. When the input vectors are binary vectors, these gates correspond to Boolean functions and can be analyzed via their polynomial representations. In practical applications, it is desirable to find a polynomial representation with the smallest number of terms possible, in order to use the least possible number of input lines to the unit under consideration. For this purpose, instead of an exa...

Find SimilarView on arXiv