September 28, 2020
Similar papers 2
July 14, 2021
In this paper we analyze, for a model of linear regression with gaussian covariates, the performance of a Bayesian estimator given by the mean of a log-concave posterior distribution with gaussian prior, in the high-dimensional limit where the number of samples and the covariates' dimension are large and proportional. Although the high-dimensional analysis of Bayesian estimators has been previously studied for Bayesian-optimal linear regression where the correct posterior is ...
September 30, 2013
It has been over 200 years since Gauss's and Legendre's famous priority dispute on who discovered the method of least squares. Nevertheless, we argue that the normal equations are still relevant in many facets of modern statistics, particularly in the domain of high-dimensional inference. Even today, we are still learning new things about the law of large numbers, first described in Bernoulli's Ars Conjectandi 300 years ago, as it applies to high dimensional inference. The ot...
September 16, 2011
Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependence, as well. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead...
October 12, 2011
The purpose of this paper is to propose methodologies for statistical inference of low-dimensional parameters with high-dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, although our ideas are applicable in a much broad context. The theoretical results presented here provide sufficient conditions for the asymptotic normality of the proposed estimators along with ...
September 19, 2022
In this paper, we present a new and effective simulation-based approach to conduct both finite- and large-sample inference for high-dimensional linear regression models. This approach is developed under the so-called repro samples framework, in which we conduct statistical inference by creating and studying the behavior of artificial samples that are obtained by mimicking the sampling mechanism of the data. We obtain confidence sets for (a) the true model corresponding to the...
August 27, 2023
These lecture notes provide an overview of existing methodologies and recent developments for estimation and inference with high dimensional time series regression models. First, we present main limit theory results for high dimensional dependent data which is relevant to covariance matrix structures as well as to dependent time series sequences. Second, we present main aspects of the asymptotic theory related to time series regression models with many covariates. Third, we d...
March 30, 2020
We study general singular value shrinkage estimators in high-dimensional regression and classification, when the number of features and the sample size both grow proportionally to infinity. We allow models with general covariance matrices that include a large class of data generating distributions. As far as the implications of our results are concerned, we find exact asymptotic formulas for both the training and test errors in regression models fitted by gradient descent, wh...
October 14, 2019
Least squares linear regression is one of the oldest and widely used data analysis tools. Although the theoretical analysis of the ordinary least squares (OLS) estimator is as old, several fundamental questions are yet to be answered. Suppose regression observations $(X_1,Y_1),\ldots,(X_n,Y_n)\in\mathbb{R}^d\times\mathbb{R}$ (not necessarily independent) are available. Some of the questions we deal with are as follows: under what conditions, does the OLS estimator converge an...
March 22, 2015
We consider high-dimensional inference when the assumed linear model is misspecified. We describe some correct interpretations and corresponding sufficient assumptions for valid asymptotic inference of the model parameters, which still have a useful meaning when the model is misspecified. We largely focus on the de-sparsified Lasso procedure but we also indicate some implications for (multiple) sample splitting techniques. In view of available methods and software, our result...
March 9, 2017
The computational complexity of simultaneous inference methods in high-dimensional linear regression models quickly increases with the number variables. This paper proposes a computationally efficient method based on the Moore-Penrose pseudoinverse. Under a symmetry assumption on the available regressors, the estimators are normally distributed and accompanied by a closed-form expression for the standard errors that is free of tuning parameters. We study the numerical perform...