The effect of regularization in high dim...

Replica analysis of overfitting in generalized linear models

April 14, 2020

86% Match

ACC Coolen, M Sheikh, A Mozeika, ... , Antenucci F

Disordered Systems and Neura...

Statistics Theory

Nearly all statistical inference methods were developed for the regime where the number $N$ of data samples is much larger than the data dimension $p$. Inference protocols such as maximum likelihood (ML) or maximum a posteriori probability (MAP) are unreliable if $p=O(N)$, due to overfitting. This limitation has for many disciplines with increasingly high-dimensional data become a serious bottleneck. We recently showed that in Cox regression for time-to-event data the overfit...

Find Similar View on arXiv

High-Dimensional Sparse Additive Hazards Regression

December 26, 2012

86% Match

Wei Lin, Jinchi Lv

Methodology

Statistics Theory

High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by modern applications in high-throughput genomic data analysis and credit risk analysis. In this article, we propose a class of regularization methods for simultaneous variable selection and estimation in the additive hazards model, by combining the nonconcave penalized likelihood approach and the pseudoscore method. In a high-dimensional setting where the dimensiona...

Find SimilarView on arXiv

Dimension reduction for integrative survival analysis

August 4, 2021

86% Match

Aaron J. Molstad, Rohit K. Patra

Methodology

Applications

Computation

We propose a constrained maximum partial likelihood estimator for dimension reduction in integrative (e.g., pan-cancer) survival analysis with high-dimensional covariates. We assume that for each population in the study, the hazard function follows a distinct Cox proportional hazards model. To borrow information across populations, we assume that all of the hazard functions depend only on a small number of linear combinations of the predictors. We estimate these linear combin...

Find SimilarView on arXiv

Cross validation approaches for penalized Cox regression

May 24, 2019

86% Match

Biyue Dai, Patrick Breheny

Methodology

Cross validation is commonly used for selecting tuning parameters in penalized regression, but its use in penalized Cox regression models has received relatively little attention in the literature. Due to its partial likelihood construction, carrying out cross validation for Cox models is not straightforward, and there are several potential approaches for implementation. Here, we propose two new cross-validation methods for Cox regression and compare them to approaches that h...

Find SimilarView on arXiv

Approximating Partial Likelihood Estimators via Optimal Subsampling

October 10, 2022

86% Match

Haixiang Zhang, Lulu Zuo, ... , Sun Liuquan

Methodology

Computation

With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling method to effectively approximate the full data maximum partial likelihood estimator in Cox's model, which largely reduces the computational burden when analyzing massive survival data. We establish consistency and asymptotic normality of a ...

Find SimilarView on arXiv

Survival Analysis with Graph-Based Regularization for Predictors

August 29, 2021

86% Match

Xi He, Liyan Xie, ... , Keskinocak Pinar

Statistics Theory

We study the variable selection problem in survival analysis to identify the most important factors affecting the survival time when the variables have prior knowledge that they have a mutual correlation through a graph structure. We consider the Cox proportional hazard model with a graph-based regularizer for variable selection. A computationally efficient algorithm is developed to solve the graph regularized maximum likelihood problem by connecting to group lasso. We provid...

Find SimilarView on arXiv

A study on tuning parameter selection for the high-dimensional lasso

February 4, 2016

86% Match

Darren Homrighausen, Daniel J. McDonald

Methodology

Machine Learning

High-dimensional predictive models, those with more measurements than observations, require regularization to be well defined, perform well empirically, and possess theoretical guarantees. The amount of regularization, often determined by tuning parameters, is integral to achieving good performance. One can choose the tuning parameter in a variety of ways, such as through resampling methods or generalized information criteria. However, the theory supporting many regularized p...

Find SimilarView on arXiv

Variable Selection in GLM and Cox Models with Second-Generation P-Values

September 20, 2021

86% Match

Yi Zuo, Thomas G. Stewart, Jeffrey D. Blume

Methodology

Applications

Variable selection has become a pivotal choice in data analyses that impacts subsequent inference and prediction. In linear models, variable selection using Second-Generation P-Values (SGPV) has been shown to be as good as any other algorithm available to researchers. Here we extend the idea of Penalized Regression with Second-Generation P-Values (ProSGPV) to the generalized linear model (GLM) and Cox regression settings. The proposed ProSGPV extension is largely free of tuni...

Find SimilarView on arXiv

Variable Selection in Ultra-high Dimensional Feature Space for the Cox Model with Interval-Censored Data

May 2, 2024

86% Match

Daewoo Pak, Jianrui Zhang, Di Wu, ... , Li Chenxi

Methodology

We develop a set of variable selection methods for the Cox model under interval censoring, in the ultra-high dimensional setting where the dimensionality can grow exponentially with the sample size. The methods select covariates via a penalized nonparametric maximum likelihood estimation with some popular penalty functions, including lasso, adaptive lasso, SCAD, and MCP. We prove that our penalized variable selection methods with folded concave penalties or adaptive lasso pen...

Find SimilarView on arXiv

Joint variable and rank selection for parsimonious estimation of high-dimensional matrices

October 17, 2011

86% Match

Florentina Bunea, Yiyuan She, Marten H. Wegkamp

Statistics Theory

Methodology

Machine Learning

Statistics Theory

We propose dimension reduction methods for sparse, high-dimensional multivariate response regression models. Both the number of responses and that of the predictors may exceed the sample size. Sometimes viewed as complementary, predictor selection and rank reduction are the most popular strategies for obtaining lower-dimensional approximations of the parameter matrix in such models. We show in this article that important gains in prediction accuracy can be obtained by conside...

Find SimilarView on arXiv

The effect of regularization in high dimensional Cox regression

Replica analysis of overfitting in generalized linear models

High-Dimensional Sparse Additive Hazards Regression

Dimension reduction for integrative survival analysis

Cross validation approaches for penalized Cox regression

Approximating Partial Likelihood Estimators via Optimal Subsampling

Survival Analysis with Graph-Based Regularization for Predictors

A study on tuning parameter selection for the high-dimensional lasso

Variable Selection in GLM and Cox Models with Second-Generation P-Values

Variable Selection in Ultra-high Dimensional Feature Space for the Cox Model with Interval-Censored Data

Joint variable and rank selection for parsimonious estimation of high-dimensional matrices