February 23, 2012
The paper considers linear regression problems where the number of predictor variables is possibly larger than the sample size. The basic motivation of the study is to combine the points of view of model selection and functional regression by using a factor approach: it is assumed that the predictor vector can be decomposed into a sum of two uncorrelated random components reflecting common factors and specific variabilities of the explanatory variables. It is shown that the t...
May 16, 2017
A great deal of interest has recently focused on conducting inference on the parameters in a high-dimensional linear model. In this paper, we consider a simple and very na\"{i}ve two-step procedure for this task, in which we (i) fit a lasso model in order to obtain a subset of the variables, and (ii) fit a least squares model on the lasso-selected set. Conventional statistical wisdom tells us that we cannot make use of the standard statistical inference tools for the result...
January 2, 2013
Traditional statistical inference considers relatively small data sets and the corresponding theoretical analysis focuses on the asymptotic behavior of a statistical estimator when the number of samples approaches infinity. However, many data sets encountered in modern applications have dimensionality significantly larger than the number of training data available, and for such problems the classical statistical tools become inadequate. In order to analyze high-dimensional da...
April 17, 2017
In this paper, we propose a new method for estimation and constructing confidence intervals for low-dimensional components in a high-dimensional model. The proposed estimator, called Constrained Lasso (CLasso) estimator, is obtained by simultaneously solving two estimating equations---one imposing a zero-bias constraint for the low-dimensional parameter and the other forming an $\ell_1$-penalized procedure for the high-dimensional nuisance parameter. By carefully choosing the...
November 22, 2013
In ultrahigh dimensional setting, independence screening has been both theoretically and empirically proved a useful variable selection framework with low computation cost. In this work, we propose a two-step framework by using marginal information in a different perspective from independence screening. In particular, we retain significant variables rather than screening out irrelevant ones. The new method is shown to be model selection consistent in the ultrahigh dimensional...
May 2, 2017
This article develops a framework for testing general hypothesis in high-dimensional models where the number of variables may far exceed the number of observations. Existing literature has considered less than a handful of hypotheses, such as testing individual coordinates of the model parameter. However, the problem of testing general and complex hypotheses remains widely open. We propose a new inference method developed around the hypothesis adaptive projection pursuit fram...
August 21, 2021
This paper proposes a new method of inference in high-dimensional regression models and high-dimensional IV regression models. Estimation is based on a combined use of the orthogonal greedy algorithm, high-dimensional Akaike information criterion, and double/debiased machine learning. The method of inference for any low-dimensional subvector of high-dimensional parameters is based on a root-$N$ asymptotic normality, which is shown to hold without requiring the exact sparsity ...
June 13, 2013
Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the \emph{uncertainty} associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of unce...
October 17, 2011
We propose dimension reduction methods for sparse, high-dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower-dimensional approximations of the parameter matrix in such models. We show in this article that important gains in prediction accuracy can be obtained by conside...
May 28, 2023
We developed a statistical inference method applicable to a broad range of generalized linear models (GLMs) in high-dimensional settings, where the number of unknown coefficients scales proportionally with the sample size. Although a pioneering inference method has been developed for logistic regression, which is a specific instance of GLMs, it is not feasible to apply this method directly to other GLMs because of unknown hyper-parameters. In this study, we addressed this lim...