ID: 2009.13229

Exact results on high-dimensional linear regression via statistical physics

September 28, 2020

View on ArXiv

Similar papers 2

Performance of Bayesian linear regression in a model with mismatch

July 14, 2021

88% Match
Jean Barbier, Wei-Kuo Chen, ... , Sáenz Manuel
math.PR
cond-mat.dis-nn
cs.IT
cs.LG
math.IT
math.MP
math.ST
stat.TH

In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the high-dimensional limit where the number of samples and the covariates' dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previously studied for Bayesian-optimal linear regression where the correct posterior is ...

Find SimilarView on arXiv

The geometry of least squares in the 21st century

September 30, 2013

88% Match
Jonathan Taylor
Statistics Theory
Statistics Theory

It has been over 200 years since Gauss's and Legendre's famous priority dispute on who discovered the method of least squares. Nevertheless, we argue that the normal equations are still relevant in many facets of modern statistics, particularly in the domain of high-dimensional inference. Even today, we are still learning new things about the law of large numbers, first described in Bernoulli's Ars Conjectandi 300 years ago, as it applies to high dimensional inference. The ot...

Find SimilarView on arXiv

High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity

September 16, 2011

88% Match
Po-Ling Loh, Martin J. Wainwright
Statistics Theory
Information Theory
Information Theory
Machine Learning
Statistics Theory

Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead...

Find SimilarView on arXiv

Confidence Intervals for Low-Dimensional Parameters in High-Dimensional Linear Models

October 12, 2011

88% Match
Cun-Hui Zhang, Stephanie S. Zhang
Methodology

The purpose of this paper is to propose methodologies for statistical inference of low-dimensional parameters with high-dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, although our ideas are applicable in a much broad context. The theoretical results presented here provide sufficient conditions for the asymptotic normality of the proposed estimators along with ...

Find SimilarView on arXiv

Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

September 19, 2022

88% Match
Peng Wang, Min-Ge Xie, Linjun Zhang
Methodology
Statistics Theory
Computation
Other Statistics
Statistics Theory

In this paper, we present a new and effective simulation-based approach to conduct both finite- and large-sample inference for high-dimensional linear regression models. This approach is developed under the so-called repro samples framework, in which we conduct statistical inference by creating and studying the behavior of artificial samples that are obtained by mimicking the sampling mechanism of the data. We obtain confidence sets for (a) the true model corresponding to the...

Find SimilarView on arXiv

High Dimensional Time Series Regression Models: Applications to Statistical Learning Methods

August 27, 2023

88% Match
Christis Katsouris
Econometrics
Machine Learning

These lecture notes provide an overview of existing methodologies and recent developments for estimation and inference with high dimensional time series regression models. First, we present main limit theory results for high dimensional dependent data which is relevant to covariance matrix structures as well as to dependent time series sequences. Second, we present main aspects of the asymptotic theory related to time series regression models with many covariates. Third, we d...

Find SimilarView on arXiv

Regularization in High-Dimensional Regression and Classification via Random Matrix Theory

March 30, 2020

88% Match
Panagiotis Lolas
Statistics Theory
Statistics Theory

We study general singular value shrinkage estimators in high-dimensional regression and classification, when the number of features and the sample size both grow proportionally to infinity. We allow models with general covariance matrices that include a large class of data generating distributions. As far as the implications of our results are concerned, we find exact asymptotic formulas for both the training and test errors in regression models fitted by gradient descent, wh...

Find SimilarView on arXiv

All of Linear Regression

October 14, 2019

87% Match
Arun K. Kuchibhotla, Lawrence D. Brown, ... , Cai Junhui
Statistics Theory
Methodology
Statistics Theory

Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we deal with are as follows: under what conditions, does the OLS estimator converge an...

Find SimilarView on arXiv

High-dimensional inference in misspecified linear models

March 22, 2015

87% Match
Peter Bühlmann, de Geer Sara van
Methodology

We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our result...

Find SimilarView on arXiv

Scalable simultaneous inference in high-dimensional linear regression models

March 9, 2017

87% Match
Tom Boot, Didier Nibbering
Statistics Theory
Statistics Theory

The computational complexity of simultaneous inference methods in high-dimensional linear regression models quickly increases with the number variables. This paper proposes a computationally efficient method based on the Moore-Penrose pseudoinverse. Under a symmetry assumption on the available regressors, the estimators are normally distributed and accompanied by a closed-form expression for the standard errors that is free of tuning parameters. We study the numerical perform...

Find SimilarView on arXiv