November 4, 2019
Empirical analysis is often the first step towards the birth of a conjecture. This is the case of the Birch-Swinnerton-Dyer (BSD) Conjecture describing the rational points on an elliptic curve, one of the most celebrated unsolved problems in mathematics. Here we extend the original empirical approach, to the analysis of the Cremona database of quantities relevant to BSD, inspecting more than 2.5 million elliptic curves by means of the latest techniques in data science, machine-learning and topological data analysis. Key quantities such as rank, Weierstrass coefficients, period, conductor, Tamagawa number, regulator and order of the Tate-Shafarevich group give rise to a high-dimensional point-cloud whose statistical properties we investigate. We reveal patterns and distributions in the rank versus Weierstrass coefficients, as well as the Beta distribution of the BSD ratio of the quantities. Via gradient boosted trees, machine learning is applied in finding inter-correlation amongst the various quantities. We anticipate that our approach will spark further research on the statistical properties of large datasets in Number Theory and more in general in pure Mathematics.
Similar papers 1
December 7, 2020
We show that standard machine-learning algorithms may be trained to predict certain invariants of low genus arithmetic curves. Using datasets of size around one hundred thousand, we demonstrate the utility of machine-learning in classification problems pertaining to the BSD invariants of an elliptic curve (including its rank and torsion subgroup), and the analogous invariants of a genus 2 curve. Our results show that a trained machine can efficiently classify curves according...
December 24, 2024
We train machine learning models to predict the order of the Shafarevich-Tate group of an elliptic curve over $\mathbb{Q}$. Building on earlier work of He, Lee, and Oliver, we show that a feed-forward neural network classifier trained on subsets of the invariants arising in the Birch--Swinnerton-Dyer conjectural formula yields higher accuracies ($> 0.9$) than any model previously studied. In addition, we develop a regression model that may be used to predict orders of this gr...
July 14, 2022
Determining the rank of an elliptic curve E/Q is a hard problem, and in some applications (e.g. when searching for curves of high rank) one has to rely on heuristics aimed at estimating the analytic rank (which is equal to the rank under the Birch and Swinnerton-Dyer conjecture). In this paper, we develop rank classification heuristics modeled by deep convolutional neural networks (CNN). Similarly to widely used Mestre-Nagao sums, it takes as an input the conductor of E and...
November 7, 2016
In this article, we propose a new probabilistic model for the distribution of ranks of elliptic curves in families of fixed Selmer rank, and compare the predictions with previous results, and with the databases of curves over the rationals that we have at our disposal. In addition, we document a phenomenon we refer to as Selmer bias that seems to play an important role in the data and in our models.
October 2, 2020
We apply some of the latest techniques from machine-learning to the arithmetic of hyperelliptic curves. More precisely we show that, with impressive accuracy and confidence (between 99 and 100 percent precision), and in very short time (matter of seconds on an ordinary laptop), a Bayesian classifier can distinguish between Sato-Tate groups given a small number of Euler factors for the L-function. Our observations are in keeping with the Sato-Tate conjecture for curves of low ...
November 28, 2017
This is an introduction to a probabilistic model for the arithmetic of elliptic curves, a model developed in a series of articles of the author with Bhargava, Kane, Lenstra, Park, Rains, Voight, and Wood. We discuss the theoretical evidence for the model, and we make predictions about elliptic curves based on corresponding theorems proved about the model. In particular, the model suggests that all but finitely many elliptic curves over $\mathbb{Q}$ have rank $\le 21$, which w...
February 12, 2025
Can machine learning help discover new mathematical structures? In this article we discuss an approach to doing this which one can call "mathematical data science". In this paradigm, one studies mathematical objects collectively rather than individually, by creating datasets and doing machine learning experiments and interpretations. After an overview, we present two case studies: murmurations in number theory and loadings of partitions related to Kronecker coefficients in re...
March 22, 2023
We survey some recent applications of machine learning to problems in geometry and theoretical physics. Pure mathematical data has been compiled over the last few decades by the community and experiments in supervised, semi-supervised and unsupervised machine learning have found surprising success. We thus advocate the programme of machine learning mathematical structures, and formulating conjectures via pattern recognition, in other words using artificial intelligence to hel...
January 15, 2021
We review, for a general audience, a variety of recent experiments on extracting structure from machine-learning mathematical data that have been compiled over the years. Focusing on supervised machine-learning on labeled data from different fields ranging from geometry to representation theory, from combinatorics to number theory, we present a comparative study of the accuracies on different problems. The paradigm should be useful for conjecture formulation, finding more eff...
February 14, 2025
In this paper, we study the vanishing order of rational $L$-functions from a data scientific perspective. Each $L$-function is represented in our data by finitely many Dirichlet coefficients, the normalisation of which depends on the context. We observe murmuration-like patterns in averages across our dataset, find that PCA clusters rational $L$-functions by their vanishing order, and record that LDA and neural networks may accurately predict this quantity.