May 22, 2024
Similar papers 2
September 13, 2024
We consider high-dimensional regression with a count response modeled by Poisson or negative binomial generalized linear model (GLM). We propose a penalized maximum likelihood estimator with a properly chosen complexity penalty and establish its adaptive minimaxity across models of various sparsity. To make the procedure computationally feasible for high-dimensional data we consider its LASSO and SLOPE convex surrogates. Their performance is illustrated through simulated and ...
December 5, 2023
We use statistical mechanics techniques, viz. the replica method, to model the effect of censoring on overfitting in Cox's proportional hazards model, the dominant regression method for time-to-event data. In the overfitting regime, Maximum Likelihood parameter estimators are known to be biased already for small values of the ratio of the number of covariates over the number of samples. The inclusion of censoring was avoided in previous overfitting analyses for mathematical c...
October 31, 2017
The popularity of penalized regression in high-dimensional data analysis has led to a demand for new inferential tools for these models. False discovery rate control is widely used in high-dimensional hypothesis testing, but has only recently been considered in the context of penalized regression. Almost all of this work, however, has focused on lasso-penalized linear regression. In this paper, we derive a general method for controlling the marginal false discovery rate that ...
October 11, 2023
Wide heterogeneity exists in cancer patients' survival, ranging from a few months to several decades. To accurately predict clinical outcomes, it is vital to build an accurate predictive model that relates patients' molecular profiles with patients' survival. With complex relationships between survival and high-dimensional molecular predictors, it is challenging to conduct non-parametric modeling and irrelevant predictors removing simultaneously. In this paper, we build a ker...
October 15, 2024
Regression analysis with missing data is a long-standing and challenging problem, particularly when there are many missing variables with arbitrary missing patterns. Likelihood-based methods, although theoretically appealing, are often computationally inefficient or even infeasible when dealing with a large number of missing variables. In this paper, we consider the Cox regression model with incomplete covariates that are missing at random. We develop an expectation-maximizat...
August 13, 2009
We consider the problem of model selection and estimation in situations where the number of parameters diverges with the sample size. When the dimension is high, an ideal method should have the oracle property [J. Amer. Statist. Assoc. 96 (2001) 1348--1360] and [Ann. Statist. 32 (2004) 928--961] which ensures the optimal large sample performance. Furthermore, the high-dimensionality often induces the collinearity problem, which should be properly handled by the ideal method. ...
October 28, 2017
This paper deals with the proportional hazards model proposed by D. R. Cox in a high-dimensional and sparse setting for a regression parameter. To estimate the regression parameter, the Dantzig selector is applied. The variable selection consistency of the Dantzig selector for the model will be proved. This property enables us to reduce the dimension of the parameter and to construct asymptotically normal estimators for the regression parameter and the cumulative baseline haz...
January 31, 2025
We study the flexible piecewise exponential model in a high dimensional setting where the number of covariates $p$ grows proportionally to the number of observations $n$ and under the hypothesis of random uncorrelated Gaussian designs. We prove rigorously that the optimal ridge penalized log-likelihood of the model converges in probability to the saddle point of a surrogate objective function. The technique of proof is the Convex Gaussian Min-Max theorem of Thrampoulidis, Oym...
June 29, 2023
This paper considers a joint survival and mixed-effects model to explain the survival time from longitudinal data and high-dimensional covariates. The longitudinal data is modeled using a nonlinear effects model, where the regression function serves as a link function incorporated into a Cox model as a covariate. In that way, the longitudinal data is related to the survival time at a given time. Additionally, the Cox model takes into account the inclusion of high-dimensional ...
July 18, 2012
To better understand the interplay of censoring and sparsity we develop finite sample properties of nonparametric Cox proportional hazard's model. Due to high impact of sequencing data, carrying genetic information of each individual, we work with over-parametrized problem and propose general class of group penalties suitable for sparse structured variable selection and estimation. Novel non-asymptotic sandwich bounds for the partial likelihood are developed. We establish how...