Replica analysis of overfitting in gener...

Using Synthetic Data to Regularize Maximum Likelihood Estimation

July 5, 2024

86% Match

Weihao Li, Dongming Huang

Statistics Theory

To overcome challenges in fitting complex models with small samples, catalytic priors have recently been proposed to stabilize the inference by supplementing observed data with synthetic data generated from simpler models. Based on a catalytic prior, the Maximum A Posteriori (MAP) estimator is a regularized estimator that maximizes the weighted likelihood of the combined data. This estimator is straightforward to compute, and its numerical performance is superior or comparabl...

Find SimilarView on arXiv

Exact results on high-dimensional linear regression via statistical physics

September 28, 2020

86% Match

Alexander Mozeika, Mansoor Sheikh, Fabian Aguirre-Lopez, ... , Coolen Anthony CC

Statistics Theory

Disordered Systems and Neura...

Statistics Theory

It is clear that conventional statistical inference protocols need to be revised to deal correctly with the high-dimensional data that are now common. Most recent studies aimed at achieving this revision rely on powerful approximation techniques, that call for rigorous results against which they can be tested. In this context, the simplest case of high-dimensional linear regression has acquired significant new relevance and attention. In this paper we use the statistical phys...

Find Similar View on arXiv

On the Properties of Simulation-based Estimators in High Dimensions

October 10, 2018

86% Match

Stéphane Guerrier, Mucyo Karemera, ... , Victoria-Feser Maria-Pia

Statistics Theory

Computation

Methodology

Statistics Theory

Considering the increasing size of available data, the need for statistical methods that control the finite sample bias is growing. This is mainly due to the frequent settings where the number of variables is large and allowed to increase with the sample size bringing standard inferential procedures to incur significant loss in terms of performance. Moreover, the complexity of statistical models is also increasing thereby entailing important computational challenges in constr...

Find SimilarView on arXiv

Feasible Adjustments of Statistical Inference in High-Dimensional Generalized Linear Models

May 28, 2023

86% Match

Kazuma Sawaya, Yoshimasa Uematsu, Masaaki Imaizumi

Statistics Theory

We developed a statistical inference method applicable to a broad range of generalized linear models (GLMs) in high-dimensional settings, where the number of unknown coefficients scales proportionally with the sample size. Although a pioneering inference method has been developed for logistic regression, which is a specific instance of GLMs, it is not feasible to apply this method directly to other GLMs because of unknown hyper-parameters. In this study, we addressed this lim...

Find SimilarView on arXiv

A modern maximum-likelihood theory for high-dimensional logistic regression

March 19, 2018

86% Match

Pragya Sur, Emmanuel J. Candes

Statistics Theory

Methodology

Statistics Theory

Every student in statistics or data science learns early on that when the sample size largely exceeds the number of variables, fitting a logistic model produces estimates that are approximately unbiased. Every student also learns that there are formulas to predict the variability of these estimates which are used for the purpose of statistical inference; for instance, to produce p-values for testing the significance of regression coefficients. Although these formulas come fro...

Find SimilarView on arXiv

Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing

June 17, 2009

85% Match

Sundeep Rangan, Alyson K. Fletcher, Vivek K Goyal

Information Theory

The replica method is a non-rigorous but well-known technique from statistical physics used in the asymptotic analysis of large, random, nonlinear problems. This paper applies the replica method, under the assumption of replica symmetry, to study estimators that are maximum a posteriori (MAP) under a postulated prior distribution. It is shown that with random linear measurements and Gaussian noise, the replica-symmetric prediction of the asymptotic behavior of the postulated ...

Find SimilarView on arXiv

Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory

January 17, 2013

85% Match

Adel Javanmard, Andrea Montanari

stat.ME

cs.IT

math.IT

math.ST

stat.ML

stat.TH

We consider linear regression in the high-dimensional regime where the number of observations $n$ is smaller than the number of parameters $p$. A very successful approach in this setting uses $\ell_1$-penalized least squares (a.k.a. the Lasso) to search for a subset of $s_0< n$ parameters that best explain the data, while setting the other parameters to zero. Considerable amount of work has been devoted to characterizing the estimation and model selection problems within this...

Find SimilarView on arXiv

An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

August 18, 2022

85% Match

Qian Zhao, Emmanuel J. Candes

Methodology

Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic theory or bootstrap calculations are grossly off the mark. This paper introduces a resized bootstrap method to infer model parameters in arbitrary dimensions. As in the parametric bootstrap, we resample observations from a distribution, wh...

Find SimilarView on arXiv

Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression

April 6, 2024

85% Match

Tomoya Wakayama

Machine Learning

Methodology

The remarkable generalization performance of overparameterized models has challenged the conventional wisdom of statistical learning theory. While recent theoretical studies have shed light on this behavior in linear models or nonlinear classifiers, a comprehensive understanding of overparameterization in nonlinear regression remains lacking. This paper explores the predictive properties of overparameterized nonlinear regression within the Bayesian framework, extending the me...

Find SimilarView on arXiv

Scalable expectation propagation for generalized linear models

July 2, 2024

85% Match

Niccolò Anceschi, Augusto Fasano, ... , Rebaudo Giovanni

Computation

Generalized linear models (GLMs) arguably represent the standard approach for statistical regression beyond the Gaussian likelihood scenario. When Bayesian formulations are employed, the general absence of a tractable posterior distribution has motivated the development of deterministic approximations, which are generally more scalable than sampling techniques. Among them, expectation propagation (EP) showed extreme accuracy, usually higher than many variational Bayes solutio...

Find SimilarView on arXiv

Replica analysis of overfitting in generalized linear models

Using Synthetic Data to Regularize Maximum Likelihood Estimation

Exact results on high-dimensional linear regression via statistical physics

On the Properties of Simulation-based Estimators in High Dimensions

Feasible Adjustments of Statistical Inference in High-Dimensional Generalized Linear Models

A modern maximum-likelihood theory for high-dimensional logistic regression

Asymptotic Analysis of MAP Estimation via the Replica Method and Applications to Compressed Sensing

Hypothesis Testing in High-Dimensional Regression under the Gaussian Random Design Model: Asymptotic Theory

An Adaptively Resized Parametric Bootstrap for Inference in High-dimensional Generalized Linear Models

Bayesian Inference for Consistent Predictions in Overparameterized Nonlinear Regression

Scalable expectation propagation for generalized linear models