May 11, 2022
As datasets used in scientific applications become more complex, studying the geometry and topology of data has become an increasingly prevalent part of the data analysis process. This can be seen for example with the growing interest in topological tools such as persistent homology. However, on the one hand, topological tools are inherently limited to providing only coarse information about the underlying space of the data. On the other hand, more geometric approaches rely predominately on the manifold hypothesis, which asserts that the underlying space is a smooth manifold. This assumption fails for many physical models where the underlying space contains singularities. In this paper we develop a machine learning pipeline that captures fine-grain geometric information without having to rely on any smoothness assumptions. Our approach involves working within the scope of algebraic geometry and algebraic varieties instead of differential geometry and smooth manifolds. In the setting of the variety hypothesis, the learning problem becomes to find the underlying variety using sample data. We cast this learning problem into a Maximum A Posteriori optimization problem which we solve in terms of an eigenvalue computation. Having found the underlying variety, we explore the use of Gr\"obner bases and numerical methods to reveal information about its geometry. In particular, we propose a heuristic for numerically detecting points lying near the singular locus of the underlying variety.
Similar papers 1
April 21, 2022
We review some recent applications of machine learning to algebraic geometry and physics. Since problems in algebraic geometry can typically be reformulated as mappings between tensors, this makes them particularly amenable to supervised learning. Additionally, unsupervised methods can provide insight into the structure of such geometrical data. At the heart of this programme is the question of how geometry can be machine learned, and indeed how AI helps one to do mathematics...
June 5, 2020
A primary hypothesis that drives scientific and engineering studies is that data has structure. The dominant paradigms for describing such structure are statistics (e.g., moments, correlation functions) and signal processing (e.g., convolutional neural nets, Fourier series). Topological Data Analysis (TDA) is a field of mathematics that analyzes data from a fundamentally different perspective. TDA represents datasets as geometric objects and provides dimensionality reduction ...
March 22, 2023
We survey some recent applications of machine learning to problems in geometry and theoretical physics. Pure mathematical data has been compiled over the last few decades by the community and experiments in supervised, semi-supervised and unsupervised machine learning have found surprising success. We thus advocate the programme of machine learning mathematical structures, and formulating conjectures via pattern recognition, in other words using artificial intelligence to hel...
December 3, 2019
Recently, machine learning (ML) has established itself in various worldwide benchmarking competitions in computational biology, including Critical Assessment of Structure Prediction (CASP) and Drug Design Data Resource (D3R) Grand Challenges. However, the intricate structural complexity and high ML dimensionality of biomolecular datasets obstruct the efficient application of ML algorithms in the field. In addition to data and algorithm, an efficient ML machinery for biomolecu...
July 12, 2024
The enduring legacy of Euclidean geometry underpins classical machine learning, which, for decades, has been primarily developed for data lying in Euclidean space. Yet, modern machine learning increasingly encounters richly structured data that is inherently nonEuclidean. This data can exhibit intricate geometric, topological and algebraic structure: from the geometry of the curvature of space-time, to topologically complex interactions between neurons in the brain, to the al...
May 11, 2022
Despite the fundamental progress in autonomous molecular and materials discovery, data scarcity throughout chemical compound space still severely hampers the use of modern ready-made machine learning models as they rely heavily on the paradigm, 'the bigger the data the better'. Presenting similarity based machine learning (SML), we show an approach to select data and train a model on-the-fly for specific queries, enabling decision making in data scarce scenarios in chemistry....
November 22, 2019
Rational design of compounds with specific properties requires conceptual understanding and fast evaluation of molecular properties throughout chemical compound space (CCS) -- the huge set of all potentially stable molecules. Recent advances in combining quantum mechanical (QM) calculations with machine learning (ML) provide powerful tools for exploring wide swaths of CCS. We present our perspective on this exciting and quickly developing field by discussing key advances in t...
October 26, 2015
We review recent studies dealing with the generation of machine learning models of molecular and solid properties. The models are trained and validated using standard quantum chemistry results obtained for organic molecules and materials selected from chemical space at random.
January 25, 2022
It is well-documented how artificial intelligence can have (and already is having) a big impact on chemical engineering. But classical machine learning approaches may be weak for many chemical engineering applications. This review discusses how challenging data characteristics arise in chemical engineering applications. We identify four characteristics of data arising in chemical engineering applications that make applying classical artificial intelligence approaches difficul...
May 31, 2023
We survey recent applications of topology and singularity theory in the study of the algebraic complexity of concrete optimization problems in applied algebraic geometry and algebraic statistics.