November 16, 2021
Driven by growing computational power and algorithmic developments, machine learning methods have become valuable tools for analyzing vast amounts of data. Simultaneously, the fast technological progress of quantum information processing suggests employing quantum hardware for machine learning purposes. Recent works discuss different architectures of quantum perceptrons, but the abilities of such quantum devices remain debated. Here, we investigate the storage capacity of a p...
July 26, 2023
In this work, we bound a machine's ability to learn based on computational limitations implied by physicality. We start by considering the information processing capacity (IPC), a normalized measure of the expected squared error of a collection of signals to a complete basis of functions. We use the IPC to measure the degradation under noise of the performance of reservoir computers, a particular kind of recurrent network, when constrained by physical considerations. First, w...
February 18, 2015
We first consider the problem of learning $k$-parities in the on-line mistake-bound model: given a hidden vector $x \in \{0,1\}^n$ with $|x|=k$ and a sequence of "questions" $a_1, a_2, ...\in \{0,1\}^n$, where the algorithm must reply to each question with $< a_i, x> \pmod 2$, what is the best tradeoff between the number of mistakes made by the algorithm and its time complexity? We improve the previous best result of Buhrman et al. by an $\exp(k)$ factor in the time complexit...
October 17, 1997
We use a binary search tree and the simplex algorithm to measure the fraction of patterns that can be stored by an Ising perceptron. The algorithm is much faster than exhaustive search and allows us to obtain accurate statistics up to a system size of N=42. The results show that the finite size scaling ansatz Nadler and Fink suggest in [1] cannot be applied to estimate accurately the storage capacity from small systems. [1] W.Nadler and W.Fink: Phys.Rev.Lett. 78, 555 (1997)
August 14, 1997
We investigate the generalization ability of a simple perceptron trained in the off-line and on-line supervised modes. Examples are extracted from the teacher who is a non-monotonic perceptron. For this system, difficulties of training can be controlled continuously by changing a parameter of the teacher. We train the student by several learning strategies in order to obtain the theoretical lower bounds of generalization errors under various conditions. Asymptotic behavior of...
April 20, 2024
The storage capacity of a binary classification model is the maximum number of random input-output pairs per parameter that the model can learn. It is one of the indicators of the expressive power of machine learning models and is important for comparing the performance of various models. In this study, we analyze the structure of the solution space and the storage capacity of fully connected two-layer neural networks with general activation functions using the replica method...
July 22, 2020
The expressive power of artificial neural networks crucially depends on the nonlinearity of their activation functions. Though a wide variety of nonlinear activation functions have been proposed for use in artificial neural networks, a detailed understanding of their role in determining the expressive power of a network has not emerged. Here, we study how activation functions affect the storage capacity of treelike two-layer networks. We relate the boundedness or divergence o...
October 14, 2020
Hyperdimensional (HD) computing is a set of neurally inspired methods for obtaining high-dimensional, low-precision, distributed representations of data. These representations can be combined with simple, neurally plausible algorithms to effect a variety of information processing tasks. HD computing has recently garnered significant interest from the computer hardware community as an energy-efficient, low-latency, and noise-robust tool for solving learning problems. In this r...
July 17, 1997
Random input patterns induce a partition of the coupling space of a perceptron into cells labeled by their output sequences. Learning some data with a maximal error rate leads to clusters of neighboring cells. By analyzing the internal structure of these clusters with the formalism of multifractals, we can handle different storage and generalization tasks for lazy students and absent-minded teachers within one unified approach. The results also allow some conclusions on the s...
March 27, 2016
There exists a theory of a single general-purpose learning algorithm which could explain the principles of its operation. This theory assumes that the brain has some initial rough architecture, a small library of simple innate circuits which are prewired at birth and proposes that all significant mental algorithms can be learned. Given current understanding and observations, this paper reviews and lists the ingredients of such an algorithm from both architectural and function...