November 8, 2017
Similar papers 3
September 8, 2017
The expressive power of neural networks is important for understanding deep learning. Most existing works consider this problem from the view of the depth of a network. In this paper, we study how width affects the expressiveness of neural networks. Classical results state that depth-bounded (e.g. depth-$2$) networks with suitable activation functions are universal approximators. We show a universal approximation theorem for width-bounded ReLU networks: width-$(n+4)$ ReLU net...
April 28, 2021
"Deep Learning"/"Deep Neural Nets" is a technological marvel that is now increasingly deployed at the cutting-edge of artificial intelligence tasks. This dramatic success of deep learning in the last few years has been hinged on an enormous amount of heuristics and it has turned out to be a serious mathematical challenge to be able to rigorously explain them. In this thesis, submitted to the Department of Applied Mathematics and Statistics, Johns Hopkins University we take se...
November 15, 2021
One of the arguments to explain the success of deep learning is the powerful approximation capacity of deep neural networks. Such capacity is generally accompanied by the explosive growth of the number of parameters, which, in turn, leads to high computational costs. It is of great interest to ask whether we can achieve successful deep learning with a small number of learnable parameters adapting to the target function. From an approximation perspective, this paper shows that...
February 23, 2024
A characterization of the representability of neural networks is relevant to comprehend their success in artificial intelligence. This study investigate two topics on ReLU neural network expressivity and their connection with a conjecture related to the minimum depth required for representing any continuous piecewise linear function (CPWL). The topics are the minimal depth representation of the sum and max operations, as well as the exploration of polytope neural networks. Fo...
November 25, 2019
This paper considers the growth in the length of one-dimensional trajectories as they are passed through deep ReLU neural networks, which, among other things, is one measure of the expressivity of deep networks. We generalise existing results, providing an alternative, simpler method for lower bounding expected trajectory growth through random networks, for a more general class of weights distributions, including sparsely connected networks. We illustrate this approach by der...
May 10, 2024
In this work, we consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function and inversely proportional to the product of network width and depth. We inherit this approximation error bound from Fourier features residual networks, a type of neural network that uses complex expone...
February 13, 2024
It is well known that neural networks with many more parameters than training examples do not overfit. Implicit regularization phenomena, which are still not well understood, occur during optimization and 'good' networks are favored. Thus the number of parameters is not an adequate measure of complexity if we do not consider all possible networks but only the 'good' ones. To better understand which networks are favored during optimization, we study the geometry of the output ...
January 30, 2022
We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth 2 and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonli...
July 13, 2020
Neuron death is a complex phenomenon with implications for model trainability: the deeper the network, the lower the probability of finding a valid initialization. In this work, we derive both upper and lower bounds on the probability that a ReLU network is initialized to a trainable point, as a function of model hyperparameters. We show that it is possible to increase the depth of a network indefinitely, so long as the width increases as well. Furthermore, our bounds are asy...
October 25, 2019
While on some natural distributions, neural-networks are trained efficiently using gradient-based algorithms, it is known that learning them is computationally hard in the worst-case. To separate hard from easy to learn distributions, we observe the property of local correlation: correlation between local patterns of the input and the target label. We focus on learning deep neural-networks using a gradient-based algorithm, when the target function is a tree-structured Boolean...