ID: 2003.08670

Semi-analytic approximate stability selection for correlated data in generalized linear models

March 19, 2020

View on ArXiv
Takashi Takahashi, Yoshiyuki Kabashima
Statistics
Condensed Matter
Computer Science
Machine Learning
Disordered Systems and Neura...
Statistical Mechanics
Machine Learning
Methodology

We consider the variable selection problem of generalized linear models (GLMs). Stability selection (SS) is a promising method proposed for solving this problem. Although SS provides practical variable selection criteria, it is computationally demanding because it needs to fit GLMs to many re-sampled datasets. We propose a novel approximate inference algorithm that can conduct SS without the repeated fitting. The algorithm is based on the replica method of statistical mechanics and vector approximate message passing of information theory. For datasets characterized by rotation-invariant matrix ensembles, we derive state evolution equations that macroscopically describe the dynamics of the proposed algorithm. We also show that their fixed points are consistent with the replica symmetric solution obtained by the replica method. Numerical experiments indicate that the algorithm exhibits fast convergence and high approximation accuracy for both synthetic and real-world data.

Similar papers 1

Replicated Vector Approximate Message Passing For Resampling Problem

May 23, 2019

91% Match
Takashi Takahashi, Yoshiyuki Kabashima
Machine Learning
Disordered Systems and Neura...
Statistical Mechanics
Machine Learning
Methodology

Resampling techniques are widely used in statistical inference and ensemble learning, in which estimators' statistical properties are essential. However, existing methods are computationally demanding, because repetitions of estimation/learning via numerical optimization/integral for each resampled data are required. In this study, we introduce a computationally efficient method to resolve such problem: replicated vector approximate message passing. This is based on a combina...

Find SimilarView on arXiv

Replica Analysis for Ensemble Techniques in Variable Selection

August 29, 2024

89% Match
Takashi Takahashi
math.ST
cond-mat.dis-nn
cond-mat.stat-mech
cs.IT
math.IT
stat.TH

Variable selection is a problem of statistics that aims to find the subset of the $N$-dimensional possible explanatory variables that are truly related to the generation process of the response variable. In high-dimensional setups, where the input dimension $N$ is comparable to the data size $M$, it is difficult to use classic methods based on $p$-values. Therefore, methods based on the ensemble learning are often used. In this review article, we introduce how the performance...

Find SimilarView on arXiv

Asymptotics of Non-Convex Generalized Linear Models in High-Dimensions: A proof of the replica formula

February 27, 2025

87% Match
Matteo Vilucchio, Yatin Dandi, ... , Krzakala Florent
Machine Learning
Machine Learning

The analytic characterization of the high-dimensional behavior of optimization for Generalized Linear Models (GLMs) with Gaussian data has been a central focus in statistics and probability in recent years. While convex cases, such as the LASSO, ridge regression, and logistic regression, have been extensively studied using a variety of techniques, the non-convex case remains far less understood despite its significance. A non-rigorous statistical physics framework has provide...

Find SimilarView on arXiv

Macroscopic Analysis of Vector Approximate Message Passing in a Model Mismatch Setting

January 9, 2020

87% Match
Takashi Takahashi, Yoshiyuki Kabashima
Information Theory
Disordered Systems and Neura...
Information Theory

Vector approximate message passing (VAMP) is an efficient approximate inference algorithm used for generalized linear models. Although VAMP exhibits excellent performance, particularly when measurement matrices are sampled from rotationally invariant ensembles, existing convergence and performance analyses have been limited mostly to cases in which the correct posterior distribution is available. Here, we extend the analyses for cases in which the correct posterior distributi...

Find SimilarView on arXiv

Semi-Analytic Resampling in Lasso

February 28, 2018

87% Match
Tomoyuki Obuchi, Yoshiyuki Kabashima
Machine Learning
Disordered Systems and Neura...
Methodology

An approximate method for conducting resampling in Lasso, the $\ell_1$ penalized linear regression, in a semi-analytic manner is developed, whereby the average over the resampled datasets is directly computed without repeated numerical sampling, thus enabling an inference free of the statistical fluctuations due to sampling finiteness, as well as a significant reduction of computational time. The proposed method is based on a message passing type algorithm, and its fast conve...

Find SimilarView on arXiv

Prediction Errors for Penalized Regressions based on Generalized Approximate Message Passing

June 26, 2022

86% Match
Ayaka Sakata
Machine Learning
Disordered Systems and Neura...
Machine Learning

We discuss the prediction accuracy of assumed statistical models in terms of prediction errors for the generalized linear model and penalized maximum likelihood methods. We derive the forms of estimators for the prediction errors, such as $C_p$ criterion, information criteria, and leave-one-out cross validation (LOOCV) error, using the generalized approximate message passing (GAMP) algorithm and replica method. These estimators coincide with each other when the number of mode...

Find SimilarView on arXiv

Bayesian Stability Selection and Inference on Inclusion Probabilities

October 29, 2024

86% Match
Mahdi Nouraie, Connor Smith, Samuel Muller
Methodology
Computation

Stability selection is a versatile framework for structure estimation and variable selection in high-dimensional setting, primarily grounded in frequentist principles. In this paper, we propose an enhanced methodology that integrates Bayesian analysis to refine the inference of inclusion probabilities within the stability selection framework. Traditional approaches rely on selection frequencies for decision-making, often disregarding domain-specific knowledge and failing to a...

Find SimilarView on arXiv

On the Selection Stability of Stability Selection and Its Applications

November 14, 2024

86% Match
Mahdi Nouraie, Samuel Muller
Methodology
Computation
Machine Learning

Stability selection is a widely adopted resampling-based framework for high-dimensional structure estimation and variable selection. However, the concept of 'stability' is often narrowly addressed, primarily through examining selection frequencies, or 'stability paths'. This paper seeks to broaden the use of an established stability estimator to evaluate the overall stability of the stability selection framework, moving beyond single-variable analysis. We suggest that the sta...

Find SimilarView on arXiv
ACC Coolen, M Sheikh, A Mozeika, ... , Antenucci F
Disordered Systems and Neura...
Statistics Theory
Statistics Theory

Nearly all statistical inference methods were developed for the regime where the number $N$ of data samples is much larger than the data dimension $p$. Inference protocols such as maximum likelihood (ML) or maximum a posteriori probability (MAP) are unreliable if $p=O(N)$, due to overfitting. This limitation has for many disciplines with increasingly high-dimensional data become a serious bottleneck. We recently showed that in Cox regression for time-to-event data the overfit...

Stability Selection

September 17, 2008

86% Match
Nicolai Meinshausen, Peter Buehlmann
Methodology

Estimation of structure, such as in variable selection, graphical modelling or cluster analysis is notoriously difficult, especially for high-dimensional data. We introduce stability selection. It is based on subsampling in combination with (high-dimensional) selection algorithms. As such, the method is extremely general and has a very wide range of applicability. Stability selection provides finite sample control for some error rates of false discoveries and hence a transpar...

Find SimilarView on arXiv