Analysis of overfitting in the regulariz...

Replica analysis of overfitting in regression models for time-to-event data

May 4, 2017

94% Match

ACC Coolen, JE Barrett, ... , Perez-Vicente CJ

Applications

Disordered Systems and Neura...

Data Analysis, Statistics an...

Overfitting, which happens when the number of parameters in a model is too large compared to the number of data points available for determining these parameters, is a serious and growing problem in survival analysis. While modern medicine presents us with data of unprecedented dimensionality, these data cannot yet be used effectively for clinical outcome prediction. Standard error measures in maximum likelihood regression, such as p-values and z-scores, are blind to overfitt...

Find SimilarView on arXiv

Replica analysis of overfitting in generalized linear models

April 14, 2020

94% Match

ACC Coolen, M Sheikh, A Mozeika, ... , Antenucci F

Disordered Systems and Neura...

Statistics Theory

Nearly all statistical inference methods were developed for the regime where the number $N$ of data samples is much larger than the data dimension $p$. Inference protocols such as maximum likelihood (ML) or maximum a posteriori probability (MAP) are unreliable if $p=O(N)$, due to overfitting. This limitation has for many disciplines with increasingly high-dimensional data become a serious bottleneck. We recently showed that in Cox regression for time-to-event data the overfit...

Find Similar View on arXiv

Replica analysis of overfitting in regression models for time to event data: the impact of censoring

December 5, 2023

93% Match

Emanuele Massa, Alexander Mozeika, Anthony Coolen

Methodology

Disordered Systems and Neura...

Statistics Theory

We use statistical mechanics techniques, viz. the replica method, to model the effect of censoring on overfitting in Cox's proportional hazards model, the dominant regression method for time-to-event data. In the overfitting regime, Maximum Likelihood parameter estimators are known to be biased already for small values of the ratio of the number of covariates over the number of samples. The inclusion of censoring was avoided in previous overfitting analyses for mathematical c...

Find SimilarView on arXiv

The effect of regularization in high dimensional Cox regression

May 22, 2024

92% Match

Emanuele Massa

Statistics Theory

Disordered Systems and Neura...

Statistics Theory

We investigate analytically the behaviour of the penalized maximum partial likelihood estimator (PMPLE). Our results are derived for a generic separable regularization, but we focus on the elastic net. This penalization is routinely adopted for survival analysis in the high dimensional regime, where the Maximum Partial Likelihood estimator (no regularization) might not even exist. Previous theoretical results require that the number $s$ of non-zero association coefficients is...

Find SimilarView on arXiv

A Modern Theory for High-dimensional Cox Regression Models

April 3, 2022

88% Match

Xianyang Zhang, Huijuan Zhou, Hanxuan Ye

Statistics Theory

The proportional hazards model has been extensively used in many fields such as biomedicine to estimate and perform statistical significance testing on the effects of covariates influencing the survival time of patients. The classical theory of maximum partial-likelihood estimation (MPLE) is used by most software packages to produce inference, e.g., the coxph function in R and the PHREG procedure in SAS. In this paper, we investigate the asymptotic behavior of the MPLE in the...

Find SimilarView on arXiv

Correction of overfitting bias in regression models

April 12, 2022

88% Match

Emanuele Massa, Marianne Jonker, ... , Coolen Anthony

Methodology

Statistics Theory

Data Analysis, Statistics an...

Statistics Theory

Regression analysis based on many covariates is becoming increasingly common. However, when the number of covariates $p$ is of the same order as the number of observations $n$, maximum likelihood regression becomes unreliable due to overfitting. This typically leads to systematic estimation biases and increased estimator variances. It is crucial for inference and prediction to quantify these effects correctly. Several methods have been proposed in literature to overcome overf...

Find SimilarView on arXiv

High Dimensional Robust Inference for Cox Regression Models

November 1, 2018

87% Match

Shengchun Kong, Zhuqing Yu, ... , Cheng Guang

Statistics Theory

We consider high-dimensional inference for potentially misspecified Cox proportional hazard models based on low dimensional results by Lin and Wei [1989]. A de-sparsified Lasso estimator is proposed based on the log partial likelihood function and shown to converge to a pseudo-true parameter vector. Interestingly, the sparsity of the true parameter can be inferred from that of the above limiting parameter. Moreover, each component of the above (non-sparse) estimator is shown ...

Find SimilarView on arXiv

Confidence intervals for high-dimensional Cox models

March 3, 2018

87% Match

Yi Yu, Jelena Bradic, Richard J. Samworth

Methodology

Statistics Theory

The purpose of this paper is to construct confidence intervals for the regression coefficients in high-dimensional Cox proportional hazards regression models where the number of covariates may be larger than the sample size. Our debiased estimator construction is similar to those in Zhang and Zhang (2014) and van de Geer et al. (2014), but the time-dependent covariates and censored risk sets introduce considerable additional challenges. Our theoretical results, which provide ...

Find SimilarView on arXiv

Adaptive estimation of the baseline hazard function in the Cox model by model selection, with high-dimensional covariates

March 1, 2015

86% Match

Agathe LSTA Guilloux, Sarah LaMME Lemler, Marie-Luce Unité MIAJ, LaMME Taupin

Statistics Theory

Applications

Statistics Theory

The purpose of this article is to provide an adaptive estimator of the baseline function in the Cox model with high-dimensional covariates. We consider a two-step procedure : first, we estimate the regression parameter of the Cox model via a Lasso procedure based on the partial log-likelihood, secondly, we plug this Lasso estimator into a least-squares type criterion and then perform a model selection procedure to obtain an adaptive penalized contrast estimator of the baselin...

Find SimilarView on arXiv

Approximating Partial Likelihood Estimators via Optimal Subsampling

October 10, 2022

86% Match

Haixiang Zhang, Lulu Zuo, ... , Sun Liuquan

Methodology

Computation

With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling method to effectively approximate the full data maximum partial likelihood estimator in Cox's model, which largely reduces the computational burden when analyzing massive survival data. We establish consistency and asymptotic normality of a ...

Find SimilarView on arXiv

Analysis of overfitting in the regularized Cox model

Replica analysis of overfitting in regression models for time-to-event data

Replica analysis of overfitting in generalized linear models

Replica analysis of overfitting in regression models for time to event data: the impact of censoring

The effect of regularization in high dimensional Cox regression

A Modern Theory for High-dimensional Cox Regression Models

Correction of overfitting bias in regression models

High Dimensional Robust Inference for Cox Regression Models

Confidence intervals for high-dimensional Cox models

Adaptive estimation of the baseline hazard function in the Cox model by model selection, with high-dimensional covariates

Approximating Partial Likelihood Estimators via Optimal Subsampling