Exact results on high-dimensional linear...

Asymptotic normality in linear regression with approximately sparse structure

March 8, 2022

87% Match

Saulius Jokubaitis, Remigijus Leipus

Statistics Theory

Probability

Statistics Theory

In this paper we study the asymptotic normality in high-dimensional linear regression. We focus on the case where the covariance matrix of the regression variables has a KMS structure, in asymptotic settings where the number of predictors, $p$, is proportional to the number of observations, $n$. The main result of the paper is the derivation of the exact asymptotic distribution for the suitably centered and normalized squared norm of the product between predictor matrix, $\ma...

Find SimilarView on arXiv

Statistical Inference and Large-scale Multiple Testing for High-dimensional Regression Models

January 25, 2023

87% Match

T. Tony Cai, Zijian Guo, Yin Xia

Methodology

Statistics Theory

This paper presents a selective survey of recent developments in statistical inference and multiple testing for high-dimensional regression models, including linear and logistic regression. We examine the construction of confidence intervals and hypothesis tests for various low-dimensional objectives such as regression coefficients and linear and quadratic functionals. The key technique is to generate debiased and desparsified estimators for the targeted low-dimensional objec...

Find SimilarView on arXiv

High-dimensional properties for empirical priors in linear regression with unknown error variance

February 11, 2022

87% Match

Xiao Fang, Malay Ghosh

Statistics Theory

We study full Bayesian procedures for high-dimensional linear regression. We adopt data-dependent empirical priors introduced in [1]. In their paper, these priors have nice posterior contraction properties and are easy to compute. Our paper extend their theoretical results to the case of unknown error variance . Under proper sparsity assumption, we achieve model selection consistency, posterior contraction rates as well as Bernstein von-Mises theorem by analyzing multivariate...

Find SimilarView on arXiv

Provable More Data Hurt in High Dimensional Least Squares Estimator

August 14, 2020

87% Match

Zeng Li, Chuanlong Xie, Qinwen Wang

Machine Learning

Applications

This paper investigates the finite-sample prediction risk of the high-dimensional least squares estimator. We derive the central limit theorem for the prediction risk when both the sample size and the number of features tend to infinity. Furthermore, the finite-sample distribution and the confidence interval of the prediction risk are provided. Our theoretical results demonstrate the sample-wise nonmonotonicity of the prediction risk and confirm "more data hurt" phenomenon.

Find SimilarView on arXiv

High Dimensional Classification via Regularized and Unregularized Empirical Risk Minimization: Precise Error and Optimal Loss

May 31, 2019

87% Match

Xiaoyi Mai, Zhenyu Liao

Machine Learning

This article provides, through theoretical analysis, an in-depth understanding of the classification performance of the empirical risk minimization framework, in both ridge-regularized and unregularized cases, when high dimensional data are considered. Focusing on the fundamental problem of separating a two-class Gaussian mixture, the proposed analysis allows for a precise prediction of the classification error for a set of numerous data vectors $\mathbf{x} \in \mathbb R^p$ o...

Find SimilarView on arXiv

High-Dimensional Non-Convex Landscapes and Gradient Descent Dynamics

August 7, 2023

87% Match

Tony Bonnaire, Davide Ghio, Kamesh Krishnamurthy, Francesca Mignacco, ... , Biroli Giulio

Disordered Systems and Neura...

Statistical Mechanics

In these lecture notes we present different methods and concepts developed in statistical physics to analyze gradient descent dynamics in high-dimensional non-convex landscapes. Our aim is to show how approaches developed in physics, mainly statistical physics of disordered systems, can be used to tackle open questions on high-dimensional dynamics in Machine Learning.

Find SimilarView on arXiv

Notes on computational-to-statistical gaps: predictions using statistical physics

March 29, 2018

87% Match

Afonso S. Bandeira, Amelia Perry, Alexander S. Wein

Machine Learning

Data Structures and Algorith...

Machine Learning

In these notes we describe heuristics to predict computational-to-statistical gaps in certain statistical problems. These are regimes in which the underlying statistical problem is information-theoretically possible although no efficient algorithm exists, rendering the problem essentially unsolvable for large instances. The methods we describe here are based on mature, albeit non-rigorous, tools from statistical physics. These notes are based on a lecture series given by th...

Find SimilarView on arXiv

Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality

August 8, 2022

87% Match

Vudtiwat Ngampruetikorn, David J. Schwab

cs.IT

cond-mat.stat-mech

cs.LG

math.IT

physics.data-an

stat.ML

Avoiding overfitting is a central challenge in machine learning, yet many large neural networks readily achieve zero training loss. This puzzling contradiction necessitates new approaches to the study of overfitting. Here we quantify overfitting via residual information, defined as the bits in fitted models that encode noise in training data. Information efficient learning algorithms minimize residual information while maximizing the relevant bits, which are predictive of the...

Find SimilarView on arXiv

Estimation in high dimensions: a geometric perspective

May 20, 2014

87% Match

Roman Vershynin

Statistics Theory

This tutorial provides an exposition of a flexible geometric framework for high dimensional estimation problems with constraints. The tutorial develops geometric intuition about high dimensional sets, justifies it with some results of asymptotic convex geometry, and demonstrates connections between geometric results and estimation problems. The theory is illustrated with applications to sparse recovery, matrix completion, quantization, linear and logistic regression and gener...

Find SimilarView on arXiv

A modern maximum-likelihood theory for high-dimensional logistic regression

March 19, 2018

87% Match

Pragya Sur, Emmanuel J. Candes

Statistics Theory

Methodology

Statistics Theory

Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testing the significance of regression coefficients. Although these formulas come fro...

Find SimilarView on arXiv

Exact results on high-dimensional linear regression via statistical physics

Asymptotic normality in linear regression with approximately sparse structure

Statistical Inference and Large-scale Multiple Testing for High-dimensional Regression Models

High-dimensional properties for empirical priors in linear regression with unknown error variance

Provable More Data Hurt in High Dimensional Least Squares Estimator

High Dimensional Classification via Regularized and Unregularized Empirical Risk Minimization: Precise Error and Optimal Loss

High-Dimensional Non-Convex Landscapes and Gradient Descent Dynamics

Notes on computational-to-statistical gaps: predictions using statistical physics

Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality

Estimation in high dimensions: a geometric perspective

A modern maximum-likelihood theory for high-dimensional logistic regression