April 14, 2020
Similar papers 2
July 5, 2024
To overcome challenges in fitting complex models with small samples, catalytic priors have recently been proposed to stabilize the inference by supplementing observed data with synthetic data generated from simpler models. Based on a catalytic prior, the Maximum A Posteriori (MAP) estimator is a regularized estimator that maximizes the weighted likelihood of the combined data. This estimator is straightforward to compute, and its numerical performance is superior or comparabl...
It is clear that conventional statistical inference protocols need to be revised to deal correctly with the high-dimensional data that are now common. Most recent studies aimed at achieving this revision rely on powerful approximation techniques, that call for rigorous results against which they can be tested. In this context, the simplest case of high-dimensional linear regression has acquired significant new relevance and attention. In this paper we use the statistical phys...
October 10, 2018
Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase with the sample size bringing standard inferential procedures to incur significant loss in terms of performance. Moreover, the complexity of statistical models is also increasing thereby entailing important computational challenges in constr...
May 28, 2023
We developed a statistical inference method applicable to a broad range of generalized linear models (GLMs) in high-dimensional settings, where the number of unknown coefficients scales proportionally with the sample size. Although a pioneering inference method has been developed for logistic regression, which is a specific instance of GLMs, it is not feasible to apply this method directly to other GLMs because of unknown hyper-parameters. In this study, we addressed this lim...
March 19, 2018
Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testing the significance of regression coefficients. Although these formulas come fro...
June 17, 2009
The replica method is a non-rigorous but well-known technique from statistical physics used in the asymptotic analysis of large, random, nonlinear problems. This paper applies the replica method, under the assumption of replica symmetry, to study estimators that are maximum a posteriori (MAP) under a postulated prior distribution. It is shown that with random linear measurements and Gaussian noise, the replica-symmetric prediction of the asymptotic behavior of the postulated ...
January 17, 2013
We consider linear regression in the high-dimensional regime where the number of observations $n$ is smaller than the number of parameters $p$. A very successful approach in this setting uses $\ell_1$-penalized least squares (a.k.a. the Lasso) to search for a subset of $s_0< n$ parameters that best explain the data, while setting the other parameters to zero. Considerable amount of work has been devoted to characterizing the estimation and model selection problems within this...
August 18, 2022
Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions. As in the parametric bootstrap, we resample observations from a distribution, wh...
April 6, 2024
The remarkable generalization performance of overparameterized models has challenged the conventional wisdom of statistical learning theory. While recent theoretical studies have shed light on this behavior in linear models or nonlinear classifiers, a comprehensive understanding of overparameterization in nonlinear regression remains lacking. This paper explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the me...
July 2, 2024
Generalized linear models (GLMs) arguably represent the standard approach for statistical regression beyond the Gaussian likelihood scenario. When Bayesian formulations are employed, the general absence of a tractable posterior distribution has motivated the development of deterministic approximations, which are generally more scalable than sampling techniques. Among them, expectation propagation (EP) showed extreme accuracy, usually higher than many variational Bayes solutio...