May 22, 2024
Similar papers 5
March 5, 2021
The lasso and elastic net are popular regularized regression models for supervised learning. Friedman, Hastie, and Tibshirani (2010) introduced a computationally efficient algorithm for computing the elastic net regularization path for ordinary least squares regression, logistic regression and multinomial logistic regression, while Simon, Friedman, Hastie, and Tibshirani (2011) extended this work to Cox models for right-censored data. We further extend the reach of the elasti...
October 17, 2016
Given data y(n) and p(n)covariates x(n) one problem in linear regression is to decide which if any of the covariates to include. There are many articles on this problem but all are based on a stochastic model for the data. This paper gives what seems to be a new approach which does not require any form of model. It is conceptually and algorithmically simple and consistency results can be proved under appropriate assumptions.
May 7, 2016
In this paper, we derive non-asymptotic error bounds for the Lasso estimator when the penalty parameter for the estimator is chosen using $K$-fold cross-validation. Our bounds imply that the cross-validated Lasso estimator has nearly optimal rates of convergence in the prediction, $L^2$, and $L^1$ norms. For example, we show that in the model with the Gaussian noise and under fairly general assumptions on the candidate set of values of the penalty parameter, the estimation er...
January 30, 2014
This work studies the statistical properties of the maximum penalized likelihood approach in a semi-parametric framework. We recall the penalized likelihood approach for estimating a function and review some asymptotic results. We investigate the properties of two estimators of the variance of maximum penalized likelihood estimators: sandwich estimator and a Bayesian estimator. The coverage rates of confidence intervals based on these estimators are studied through a simulati...
September 4, 2023
Challenges with data in the big-data era include (i) the dimension $p$ is often larger than the sample size $n$ (ii) outliers or contaminated points are frequently hidden and more difficult to detect. Challenge (i) renders most conventional methods inapplicable. Thus, it attracts tremendous attention from statistics, computer science, and bio-medical communities. Numerous penalized regression methods have been introduced as modern methods for analyzing high-dimensional data. ...
June 23, 2011
We consider a general high-dimensional additive hazard model in a non-asymptotic setting, including regression for censored-data. In this context, we consider a Lasso estimator with a fully data-driven $\ell_1$ penalization, which is tuned for the estimation problem at hand. We prove sharp oracle inequalities for this estimator. Our analysis involves a new "data-driven" Bernstein's inequality, that is of independent interest, where the predictable variation is replaced by the...
July 18, 2018
We introduce a pliable lasso method for estimation of interaction effects in the Cox proportional hazards model framework. The pliable lasso is a linear model that includes interactions between covariates X and a set of modifying variables Z and assumes sparsity of the main effects and interaction effects. The hierarchical penalty excludes interaction effects when the corresponding main effects are zero: this avoids overfitting and an explosion of model complexity. We extend ...
October 3, 2022
A prevalent feature of high-dimensional data is the dependence among covariates, and model selection is known to be challenging when covariates are highly correlated. To perform model selection for the high-dimensional Cox proportional hazards model in presence of correlated covariates with factor structure, we propose a new model, Factor-Augmented Regularized Model for Hazard Regression (FarmHazard), which builds upon latent factors that drive covariate dependence and extend...
October 25, 2010
High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of nonconcave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox's proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCA...
September 9, 2022
In high dimensional regression, where the number of covariates is of the order of the number of observations, ridge penalization is often used as a remedy against overfitting. Unfortunately, for correlated covariates such regularisation typically induces in generalized linear models not only shrinking of the estimated parameter vector, but also an unwanted \emph{rotation} relative to the true vector. We show analytically how this problem can be removed by using a generalizati...