Inference in High-dimensional Linear Regression

June 22, 2021

Heather S. Battey, Nancy Reid

Statistics

Mathematics

Methodology

Statistics Theory

This paper develops an approach to inference in a linear regression model when the number of potential explanatory variables is larger than the sample size. The approach treats each regression coefficient in turn as the interest parameter, the remaining coefficients being nuisance parameters, and seeks an optimal interest-respecting transformation, inducing sparsity on the relevant blocks of the notional Fisher information matrix. The induced sparsity is exploited through a marginal least squares analysis for each variable, as in a factorial experiment, thereby avoiding penalization. One parameterization of the problem is found to be particularly convenient, both computationally and mathematically. In particular, it permits an analytic solution to the optimal transformation problem, facilitating theoretical analysis and comparison to other work. In contrast to regularized regression such as the lasso and its extensions, neither adjustment for selection nor rescaling of the explanatory variables is needed, ensuring the physical interpretation of regression coefficients is retained. Recommended usage is within a broader set of inferential statements, so as to reflect uncertainty over the model as well as over the parameters. The considerations involved in extending the work to other regression models are briefly discussed.

Statistical Inference and Large-scale Multiple Testing for High-dimensional Regression Models

January 25, 2023

92% Match

T. Tony Cai, Zijian Guo, Yin Xia

Methodology

Statistics Theory

This paper presents a selective survey of recent developments in statistical inference and multiple testing for high-dimensional regression models, including linear and logistic regression. We examine the construction of confidence intervals and hypothesis tests for various low-dimensional objectives such as regression coefficients and linear and quadratic functionals. The key technique is to generate debiased and desparsified estimators for the targeted low-dimensional objec...

Find SimilarView on arXiv

Confidence Intervals for Low-Dimensional Parameters in High-Dimensional Linear Models

October 12, 2011

91% Match

Cun-Hui Zhang, Stephanie S. Zhang

Methodology

The purpose of this paper is to propose methodologies for statistical inference of low-dimensional parameters with high-dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, although our ideas are applicable in a much broad context. The theoretical results presented here provide sufficient conditions for the asymptotic normality of the proposed estimators along with ...

Find SimilarView on arXiv

Generalized Fiducial Inference for Ultrahigh Dimensional Regression

April 30, 2013

91% Match

Randy C. S. Lai, Jan Hannig, Thomas C. M. Lee

Methodology

In recent years the ultrahigh dimensional linear regression problem has attracted enormous attentions from the research community. Under the sparsity assumption most of the published work is devoted to the selection and estimation of the significant predictor variables. This paper studies a different but fundamentally important aspect of this problem: uncertainty quantification for parameter estimates and model choices. To be more specific, this paper proposes methods for der...

Find SimilarView on arXiv

Bayesian High-dimensional Linear Regression with Sparse Projection-posterior

October 22, 2024

91% Match

Samhita Pal, Subhashis Ghoshal

Methodology

Statistics Theory

We consider a novel Bayesian approach to estimation, uncertainty quantification, and variable selection for a high-dimensional linear regression model under sparsity. The number of predictors can be nearly exponentially large relative to the sample size. We put a conjugate normal prior initially disregarding sparsity, but for making an inference, instead of the original multivariate normal posterior, we use the posterior distribution induced by a map transforming the vector o...

Find SimilarView on arXiv

Estimation in high-dimensional linear models with deterministic design matrices

June 5, 2012

91% Match

Jun Shao, Xinwei Deng

Statistics Theory

Because of the advance in technologies, modern statistical studies often encounter linear models with the number of explanatory variables much larger than the sample size. Estimation and variable selection in these high-dimensional problems with deterministic design points is very different from those in the case of random covariates, due to the identifiability of the high-dimensional regression parameter vector. We show that a reasonable approach is to focus on the projectio...

Find SimilarView on arXiv

High-dimensional inference in misspecified linear models

March 22, 2015

91% Match

Peter Bühlmann, de Geer Sara van

Methodology

We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our result...

Find SimilarView on arXiv

Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

September 19, 2022

91% Match

Peng Wang, Min-Ge Xie, Linjun Zhang

Methodology

Statistics Theory

Computation

Other Statistics

Statistics Theory

In this paper, we present a new and effective simulation-based approach to conduct both finite- and large-sample inference for high-dimensional linear regression models. This approach is developed under the so-called repro samples framework, in which we conduct statistical inference by creating and studying the behavior of artificial samples that are obtained by mimicking the sampling mechanism of the data. We obtain confidence sets for (a) the true model corresponding to the...

Find SimilarView on arXiv

Invariance-based Inference in High-Dimensional Regression with Finite-Sample Guarantees

December 22, 2023

90% Match

Wenxuan Guo, Panos Toulis

Methodology

Statistics Theory

In this paper, we develop invariance-based procedures for testing and inference in high-dimensional regression models. These procedures, also known as randomization tests, provide several important advantages. First, for the global null hypothesis of significance, our test is valid in finite samples. It is also simple to implement and comes with finite-sample guarantees on statistical power. Remarkably, despite its simplicity, this testing idea has escaped the attention of ea...

Find SimilarView on arXiv

Sparsified Simultaneous Confidence Intervals for High-Dimensional Linear Models

July 14, 2023

90% Match

Xiaorui Zhu, Yichen Qin, Peng Wang

Methodology

Econometrics

Machine Learning

Statistical inference of the high-dimensional regression coefficients is challenging because the uncertainty introduced by the model selection procedure is hard to account for. A critical question remains unsettled; that is, is it possible and how to embed the inference of the model into the simultaneous inference of the coefficients? To this end, we propose a notion of simultaneous confidence intervals called the sparsified simultaneous confidence intervals. Our intervals ar...

Find SimilarView on arXiv

Inference for High-Dimensional Sparse Econometric Models

December 31, 2011

90% Match

Alexandre Belloni, Victor Chernozhukov, Christian Hansen

Methodology

Econometrics

Applications

This article is about estimation and inference methods for high dimensional sparse (HDS) regression models in econometrics. High dimensional sparse models arise in situations where many regressors (or series terms) are available and the regression function is well-approximated by a parsimonious, yet unknown set of regressors. The latter condition makes it possible to estimate the entire regression function effectively by searching for approximately the right set of regressors...

Find SimilarView on arXiv