Invariance-based Inference in High-Dimensional Regression with Finite-Sample Guarantees

December 22, 2023

Wenxuan Guo, Panos Toulis

Statistics

Mathematics

Methodology

Statistics Theory

In this paper, we develop invariance-based procedures for testing and inference in high-dimensional regression models. These procedures, also known as randomization tests, provide several important advantages. First, for the global null hypothesis of significance, our test is valid in finite samples. It is also simple to implement and comes with finite-sample guarantees on statistical power. Remarkably, despite its simplicity, this testing idea has escaped the attention of earlier analytical work, which mainly concentrated on complex high-dimensional asymptotic methods. Under an additional assumption of Gaussian design, we show that this test also achieves the minimax optimal rate against certain nonsparse alternatives, a type of result that is rare in the literature. Second, for partial null hypotheses, we propose residual-based tests and derive theoretical conditions for their validity. These tests can be made powerful by constructing the test statistic in a way that, first, selects the important covariates (e.g., through Lasso) and then orthogonalizes the nuisance parameters. We illustrate our results through extensive simulations and applied examples. One consistent finding is that the strong finite-sample guarantees associated with our procedures result in added robustness when it comes to handling multicollinearity and heavy-tailed covariates.

Robust Inference for High-Dimensional Linear Models via Residual Randomization

June 14, 2021

93% Match

Y. Samuel Wang, Si Kai Lee, ... , Kolar Mladen

Methodology

Machine Learning

We propose a residual randomization procedure designed for robust Lasso-based inference in the high-dimensional setting. Compared to earlier work that focuses on sub-Gaussian errors, the proposed procedure is designed to work robustly in settings that also include heavy-tailed covariates and errors. Moreover, our procedure can be valid under clustered errors, which is important in practice, but has been largely overlooked by earlier work. Through extensive simulations, we ill...

Find SimilarView on arXiv

Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory

January 17, 2013

91% Match

Adel Javanmard, Andrea Montanari

stat.ME

cs.IT

math.IT

math.ST

stat.ML

stat.TH

We consider linear regression in the high-dimensional regime where the number of observations $n$ is smaller than the number of parameters $p$. A very successful approach in this setting uses $\ell_1$-penalized least squares (a.k.a. the Lasso) to search for a subset of $s_0< n$ parameters that best explain the data, while setting the other parameters to zero. Considerable amount of work has been devoted to characterizing the estimation and model selection problems within this...

Find SimilarView on arXiv

Randomized tests for high-dimensional regression: A more efficient and powerful solution

October 3, 2020

91% Match

Yue Li, Ilmun Kim, Yuting Wei

Methodology

We investigate the problem of testing the global null in the high-dimensional regression models when the feature dimension $p$ grows proportionally to the number of observations $n$. Despite a number of prior work studying this problem, whether there exists a test that is model-agnostic, efficient to compute and enjoys high power, still remains unsettled. In this paper, we answer this question in the affirmative by leveraging the random projection techniques, and propose a te...

Find SimilarView on arXiv

In Defense of the Indefensible: A Very Naive Approach to High-Dimensional Inference

May 16, 2017

91% Match

Sen Zhao, Daniela Witten, Ali Shojaie

Methodology

Statistics Theory

Machine Learning

Statistics Theory

A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model. In this paper, we consider a simple and very na\"{i}ve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the result...

Find SimilarView on arXiv

Inference in High-dimensional Linear Regression

June 22, 2021

90% Match

Heather S. Battey, Nancy Reid

Methodology

Statistics Theory

This paper develops an approach to inference in a linear regression model when the number of potential explanatory variables is larger than the sample size. The approach treats each regression coefficient in turn as the interest parameter, the remaining coefficients being nuisance parameters, and seeks an optimal interest-respecting transformation, inducing sparsity on the relevant blocks of the notional Fisher information matrix. The induced sparsity is exploited through a m...

Find SimilarView on arXiv

Goodness of fit tests for high-dimensional linear models

November 10, 2015

90% Match

Rajen D. Shah, Peter Bühlmann

Methodology

Statistics Theory

In this work we propose a framework for constructing goodness of fit tests in both low and high-dimensional linear models. We advocate applying regression methods to the scaled residuals following either an ordinary least squares or Lasso fit to the data, and using some proxy for prediction error as the final test statistic. We call this family Residual Prediction (RP) tests. We show that simulation can be used to obtain the critical values for such tests in the low-dimension...

Find SimilarView on arXiv

Finite- and Large- Sample Inference for Model and Coefficients in High-dimensional Linear Regression with Repro Samples

September 19, 2022

90% Match

Peng Wang, Min-Ge Xie, Linjun Zhang

Methodology

Statistics Theory

Computation

Other Statistics

Statistics Theory

In this paper, we present a new and effective simulation-based approach to conduct both finite- and large-sample inference for high-dimensional linear regression models. This approach is developed under the so-called repro samples framework, in which we conduct statistical inference by creating and studying the behavior of artificial samples that are obtained by mimicking the sampling mechanism of the data. We obtain confidence sets for (a) the true model corresponding to the...

Find SimilarView on arXiv

Residual Permutation Test for High-Dimensional Regression Coefficient Testing

November 29, 2022

90% Match

Kaiyue Wen, Tengyao Wang, Yuhao Wang

Statistics Theory

Methodology

Statistics Theory

We consider the problem of testing whether a single coefficient is equal to zero in fixed-design linear models under a moderately high-dimensional regime, where the dimension of covariates $p$ is allowed to be in the same order of magnitude as sample size $n$. In this regime, to achieve finite-population validity, existing methods usually require strong distributional assumptions on the noise vector (such as Gaussian or rotationally invariant), which limits their applications...

Find SimilarView on arXiv

High-dimensional inference in misspecified linear models

March 22, 2015

90% Match

Peter Bühlmann, de Geer Sara van

Methodology

We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our result...

Find SimilarView on arXiv

Post-Lasso Inference for High-Dimensional Regression

June 16, 2018

89% Match

X. Jessie Jeng, Huimin Peng, Wenbin Lu

Methodology

Among the most popular variable selection procedures in high-dimensional regression, Lasso provides a solution path to rank the variables and determines a cut-off position on the path to select variables and estimate coefficients. In this paper, we consider variable selection from a new perspective motivated by the frequently occurred phenomenon that relevant variables are not completely distinguishable from noise variables on the solution path. We propose to characterize the...

Find SimilarView on arXiv