Statistical Mechanics of High-Dimensional Inference

January 18, 2016

View on ArXiv

An equivalence between high dimensional Bayes optimal inference and M-estimation

September 22, 2016

87% Match

Madhu Advani, Surya Ganguli

Machine Learning

Disordered Systems and Neura...

Statistics Theory

Neurons and Cognition

Statistics Theory

When recovering an unknown signal from noisy measurements, the computational difficulty of performing optimal Bayesian MMSE (minimum mean squared error) inference often necessitates the use of maximum a posteriori (MAP) inference, a special case of regularized M-estimation, as a surrogate. However, MAP is suboptimal in high dimensions, when the number of unknown signal components is similar to the number of measurements. In this work we demonstrate, when the signal distributi...

Find SimilarView on arXiv

Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering

October 10, 2016

87% Match

Thibault Lesieur, Bacco Caterina De, Jess Banks, Florent Krzakala, ... , Zdeborová Lenka

Machine Learning

Disordered Systems and Neura...

Information Theory

We consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of $m$ points in $n$ dimensions, $n,m \rightarrow \infty$ and $\alpha = m/n$ stays finite. Using exact but non-rigorous methods from statistical physics, we determine the critical value of $\alpha$ and the distance between the clusters at which it becomes information-theoretically possible to reconstruct the membership into clusters better than chance. We also determin...

Find SimilarView on arXiv

High dimensionality: The latest challenge to data analysis

February 12, 2019

87% Match

A. M. Pires, J. A. Branco

Methodology

The advent of modern technology, permitting the measurement of thousands of characteristics simultaneously, has given rise to floods of data characterized by many large or even huge datasets. This new paradigm presents extraordinary challenges to data analysis and the question arises: how can conventional data analysis methods, devised for moderate or small datasets, cope with the complexities of modern data? The case of high dimensional data is particularly revealing of some...

Find SimilarView on arXiv

Information-Theoretic Limits for the Matrix Tensor Product

May 22, 2020

87% Match

Galen Reeves

Information Theory

Probability

Machine Learning

This paper studies a high-dimensional inference problem involving the matrix tensor product of random matrices. This problem generalizes a number of contemporary data science problems including the spiked matrix models used in sparse principal component analysis and covariance estimation and the stochastic block model used in network analysis. The main results are single-letter formulas (i.e., analytical expressions that can be approximated numerically) for the mutual informa...

Find SimilarView on arXiv

Statistical mechanical analysis of sparse linear regression as a variable selection problem

May 29, 2018

87% Match

Tomoyuki Obuchi, Yoshinori Nakanishi-Ohno, ... , Kabashima Yoshiyuki

Disordered Systems and Neura...

Information Theory

Machine Learning

An algorithmic limit of compressed sensing or related variable-selection problems is analytically evaluated when a design matrix is given by an overcomplete random matrix. The replica method from statistical mechanics is employed to derive the result. The analysis is conducted through evaluation of the entropy, an exponential rate of the number of combinations of variables giving a specific value of fit error to given data which is assumed to be generated from a linear proces...

Find SimilarView on arXiv

Four lectures on probabilistic methods for data science

December 20, 2016

87% Match

Roman Vershynin

math.PR

cs.DS

cs.IT

math.IT

math.ST

stat.TH

Methods of high-dimensional probability play a central role in applications for statistics, signal processing theoretical computer science and related fields. These lectures present a sample of particularly useful tools of high-dimensional probability, focusing on the classical and matrix Bernstein's inequality and the uniform matrix deviation inequality. We illustrate these tools with applications for dimension reduction, network analysis, covariance estimation, matrix compl...

Find SimilarView on arXiv

High dimensional statistical inference: theoretical development to data analytics

August 19, 2019

87% Match

Deepak Nag Ayyala

Statistics Theory

Methodology

Statistics Theory

This article is due to appear in the Handbook of Statistics, Vol. 43, Elsevier/North-Holland, Amsterdam, edited by Arni S. R. Srinivasa Rao and C. R. Rao. In modern day analytics, there is ever growing need to develop statistical models to study high dimensional data. Between dimension reduction, asymptotics-driven methods and random projection based methods, there are several approaches developed so far. For high dimensional parametric models, estimation and hypothesis tes...

Find SimilarView on arXiv

Fundamental Limits of Ridge-Regularized Empirical Risk Minimization in High Dimensions

June 16, 2020

87% Match

Hossein Taheri, Ramtin Pedarsani, Christos Thrampoulidis

Machine Learning

Information Theory

Machine Learning

Signal Processing

Information Theory

Empirical Risk Minimization (ERM) algorithms are widely used in a variety of estimation and prediction tasks in signal-processing and machine learning applications. Despite their popularity, a theory that explains their statistical properties in modern regimes where both the number of measurements and the number of unknown parameters is large is only recently emerging. In this paper, we characterize for the first time the fundamental limits on the statistical accuracy of conv...

Find SimilarView on arXiv

Optimal Shrinkage Estimator for High-Dimensional Mean Vector

October 28, 2016

87% Match

Taras Bodnar, Ostap Okhrin, Nestor Parolya

Statistics Theory

Statistical Finance

Statistics Theory

In this paper we derive the optimal linear shrinkage estimator for the high-dimensional mean vector using random matrix theory. The results are obtained under the assumption that both the dimension $p$ and the sample size $n$ tend to infinity in such a way that $p/n \to c\in(0,\infty)$. Under weak conditions imposed on the underlying data generating mechanism, we find the asymptotic equivalents to the optimal shrinkage intensities and estimate them consistently. The proposed ...

Find SimilarView on arXiv

Inference in High-dimensional Linear Regression

June 22, 2021

87% Match

Heather S. Battey, Nancy Reid

Methodology

Statistics Theory

This paper develops an approach to inference in a linear regression model when the number of potential explanatory variables is larger than the sample size. The approach treats each regression coefficient in turn as the interest parameter, the remaining coefficients being nuisance parameters, and seeks an optimal interest-respecting transformation, inducing sparsity on the relevant blocks of the notional Fisher information matrix. The induced sparsity is exploited through a m...

Find SimilarView on arXiv