June 26, 2022
We discuss the prediction accuracy of assumed statistical models in terms of prediction errors for the generalized linear model and penalized maximum likelihood methods. We derive the forms of estimators for the prediction errors, such as $C_p$ criterion, information criteria, and leave-one-out cross validation (LOOCV) error, using the generalized approximate message passing (GAMP) algorithm and replica method. These estimators coincide with each other when the number of model parameters is sufficiently small; however, there is a discrepancy between them in particular in the parameter region where the number of model parameters is larger than the data dimension. In this paper, we review the prediction errors and corresponding estimators, and discuss their differences. In the framework of GAMP, we show that the information criteria can be expressed by using the variance of the estimates. Further, we demonstrate how to approach LOOCV error from the information criteria by utilizing the expression provided by GAMP.
Similar papers 1
February 20, 2018
We propose an estimator of prediction error using an approximate message passing (AMP) algorithm that can be applied to a broad range of sparse penalties. Following Stein's lemma, the estimator of the generalized degrees of freedom, which is a key quantity for the construction of the estimator of the prediction error, is calculated at the AMP fixed point. The resulting form of the AMP-based estimator does not depend on the penalty function, and its value can be further improv...
September 24, 2013
We study the problem of selection of regularization parameter in penalized Gaussian graphical models. When the goal is to obtain the model with good predicting power, cross validation is the gold standard. We present a new estimator of Kullback-Leibler loss in Gaussian Graphical model which provides a computationally fast alternative to cross-validation. The estimator is obtained by approximating leave-one-out-cross validation. Our approach is demonstrated on simulated data s...
February 27, 2025
The analytic characterization of the high-dimensional behavior of optimization for Generalized Linear Models (GLMs) with Gaussian data has been a central focus in statistics and probability in recent years. While convex cases, such as the LASSO, ridge regression, and logistic regression, have been extensively studied using a variety of techniques, the non-convex case remains far less understood despite its significance. A non-rigorous statistical physics framework has provide...
September 7, 2009
Recent literature provides many computational and modeling approaches for covariance matrices estimation in a penalized Gaussian graphical models but relatively little study has been carried out on the choice of the tuning parameter. This paper tries to fill this gap by focusing on the problem of shrinkage parameter selection when estimating sparse precision matrices using the penalized likelihood approach. Previous approaches typically used K-fold cross-validation in this re...
April 26, 2024
Many problems in statistics and machine learning can be formulated as model selection problems, where the goal is to choose an optimal parsimonious model among a set of candidate models. It is typical to conduct model selection by penalizing the objective function via information criteria (IC), as with the pioneering work by Akaike and Schwarz. Via recent work, we propose a generalized IC framework to consistently estimate general loss-based learning problems. In this work, w...
September 12, 2016
In this paper, we study the performance of extremum estimators from the perspective of generalization ability (GA): the ability of a model to predict outcomes in new samples from the same population. By adapting the classical concentration inequalities, we derive upper bounds on the empirical out-of-sample prediction errors as a function of the in-sample errors, in-sample data size, heaviness in the tails of the error distribution, and model complexity. We show that the error...
March 3, 2020
We study the problem of out-of-sample risk estimation in the high dimensional regime where both the sample size $n$ and number of features $p$ are large, and $n/p$ can be less than one. Extensive empirical evidence confirms the accuracy of leave-one-out cross validation (LO) for out-of-sample risk estimation. Yet, a unifying theoretical evaluation of the accuracy of LO in high-dimensional problems has remained an open problem. This paper aims to fill this gap for penalized re...
October 18, 2016
We study model evaluation and model selection from the perspective of generalization ability (GA): the ability of a model to predict outcomes in new samples from the same population. We believe that GA is one way formally to address concerns about the external validity of a model. The GA of a model estimated on a sample can be measured by its empirical out-of-sample errors, called the generalization errors (GE). We derive upper bounds for the GE, which depend on sample sizes,...
December 29, 2015
We investigate leave-one-out cross validation (CV) as a determinator of the weight of the penalty term in the least absolute shrinkage and selection operator (LASSO). First, on the basis of the message passing algorithm and a perturbative discussion assuming that the number of observations is sufficiently large, we provide simple formulas for approximately assessing two types of CV errors, which enable us to significantly reduce the necessary cost of computation. These formul...
October 21, 2024
Conformal prediction has emerged as a powerful tool for building prediction intervals that are valid in a distribution-free way. However, its evaluation may be computationally costly, especially in the high-dimensional setting where the dimensionality and sample sizes are both large and of comparable magnitudes. To address this challenge in the context of generalized linear regression, we propose a novel algorithm based on Approximate Message Passing (AMP) to accelerate the c...