October 25, 2019
While on some natural distributions, neural-networks are trained efficiently using gradient-based algorithms, it is known that learning them is computationally hard in the worst-case. To separate hard from easy to learn distributions, we observe the property of local correlation: correlation between local patterns of the input and the target label. We focus on learning deep neural-networks using a gradient-based algorithm, when the target function is a tree-structured Boolean circuit. We show that in this case, the existence of correlation between the gates of the circuit and the target label determines whether the optimization succeeds or fails. Using this result, we show that neural-networks can learn the (log n)-parity problem for most product distributions. These results hint that local correlation may play an important role in separating easy/hard to learn distributions. We also obtain a novel depth separation result, in which we show that a shallow network cannot express some functions, while there exists an efficient gradient-based algorithm that can learn the very same functions using a deep network. The negative expressivity result for shallow networks is obtained by a reduction from results in communication complexity, that may be of independent interest.
Similar papers 1
August 18, 2020
A supervised learning algorithm has access to a distribution of labeled examples, and needs to return a function (hypothesis) that correctly labels the examples. The hypothesis of the learner is taken from some fixed class of functions (e.g., linear classifiers, neural networks etc.). A failure of the learning algorithm can occur due to two possible reasons: wrong choice of hypothesis class (hardness of approximation), or failure to find the best function within the hypothesi...
February 18, 2020
In recent years we see a rapidly growing line of research which shows learnability of various models via common neural network algorithms. Yet, besides a very few outliers, these results show learnability of models that can be learned using linear methods. Namely, such results show that learning neural-networks with gradient-descent is competitive with learning a linear classifier on top of a data-independent representation of the examples. This leaves much to be desired, as ...
September 5, 2016
Although neural networks are routinely and successfully trained in practice using simple gradient-based methods, most existing theoretical results are negative, showing that learning such networks is difficult, in a worst-case sense over all data distributions. In this paper, we take a more nuanced view, and consider whether specific assumptions on the "niceness" of the input distribution, or "niceness" of the target function (e.g. in terms of smoothness, non-degeneracy, inco...
January 20, 2021
We prove hardness-of-learning results under a well-studied assumption on the existence of local pseudorandom generators. As we show, this assumption allows us to surpass the current state of the art, and prove hardness of various basic problems, with no hardness results to date. Our results include: hardness of learning shallow ReLU neural networks under the Gaussian distribution and other distributions; hardness of learning intersections of $\omega(1)$ halfspaces, DNF form...
February 3, 2025
This paper investigates the learnability of the nonlinearity property of Boolean functions using neural networks. We train encoder style deep neural networks to learn to predict the nonlinearity of Boolean functions from examples of functions in the form of a truth table and their corresponding nonlinearity values. We report empirical results to show that deep neural networks are able to learn to predict the property for functions in 4 and 5 variables with an accuracy above 9...
May 7, 2019
We consider efficiency in the implementation of deep neural networks. Hardware accelerators are gaining interest as machine learning becomes one of the drivers of high-performance computing. In these accelerators, the directed graph describing a neural network can be implemented as a directed graph describing a Boolean circuit. We make this observation precise, leading naturally to an understanding of practical neural networks as discrete functions, and show that so-called bi...
January 1, 2025
Learning parity functions is a canonical problem in learning theory, which although computationally tractable, is not amenable to standard learning algorithms such as gradient-based methods. This hardness is usually explained via statistical query lower bounds [Kearns, 1998]. However, these bounds only imply that for any given algorithm, there is some worst-case parity function that will be hard to learn. Thus, they do not explain why fixed parities - say, the full parity fun...
March 16, 2022
Balancing model complexity against the information contained in observed data is the central challenge to learning. In order for complexity-efficient models to exist and be discoverable in high dimensions, we require a computational framework that relates a credible notion of complexity to simple parameter representations. Further, this framework must allow excess complexity to be gradually removed via gradient-based optimization. Our n-ary, or n-argument, activation function...
December 16, 2018
As the success of deep learning reaches more grounds, one would like to also envision the potential limits of deep learning. This paper gives a first set of results proving that certain deep learning algorithms fail at learning certain efficiently learnable functions. The results put forward a notion of cross-predictability that characterizes when such failures take place. Parity functions provide an extreme example with a cross-predictability that decays exponentially, while...
September 14, 2014
In this technical report we presented a novel approach to machine learning. Once the new framework is presented, we will provide a simple and yet very powerful learning algorithm which will be benchmark on various dataset. The framework we proposed is based on booleen circuits; more specifically the classifier produced by our algorithm have that form. Using bits and boolean gates instead of real numbers and multiplication enable the the learning algorithm and classifier to ...