LR-GLM: High-Dimensional Bayesian Inference Using Low-Rank Data Approximations

May 17, 2019

View on ArXiv

Brian L. Trippe, Jonathan H. Huggins, Raj Agrawal, Tamara Broderick

Statistics

Computer Science

Computation

Machine Learning

Methodology

Machine Learning

Due to the ease of modern data collection, applied statisticians often have access to a large set of covariates that they wish to relate to some observed outcome. Generalized linear models (GLMs) offer a particularly interpretable framework for such an analysis. In these high-dimensional problems, the number of covariates is often large relative to the number of observations, so we face non-trivial inferential uncertainty; a Bayesian approach allows coherent quantification of this uncertainty. Unfortunately, existing methods for Bayesian inference in GLMs require running times roughly cubic in parameter dimension, and so are limited to settings with at most tens of thousand parameters. We propose to reduce time and memory costs with a low-rank approximation of the data in an approach we call LR-GLM. When used with the Laplace approximation or Markov chain Monte Carlo, LR-GLM provides a full Bayesian posterior approximation and admits running times reduced by a full factor of the parameter dimension. We rigorously establish the quality of our approximation and show how the choice of rank allows a tunable computational-statistical trade-off. Experiments support our theory and demonstrate the efficacy of LR-GLM on real large-scale datasets.

Scalable Bayesian inference for the generalized linear mixed model

March 5, 2024

91% Match

Samuel I. Berchuck, Felipe A. Medeiros, ... , Agazzi Andrea

Computation

Methodology

Machine Learning

The generalized linear mixed model (GLMM) is a popular statistical approach for handling correlated data, and is used extensively in applications areas where big data is common, including biomedical data settings. The focus of this paper is scalable statistical inference for the GLMM, where we define statistical inference as: (i) estimation of population parameters, and (ii) evaluation of scientific hypotheses in the presence of uncertainty. Artificial intelligence (AI) learn...

Find SimilarView on arXiv

Accelerating Generalized Linear Models by Trading off Computation for Uncertainty

October 31, 2023

90% Match

Lukas Tatzel, Jonathan Wenger, ... , Hennig Philipp

Machine Learning

Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterat...

Find SimilarView on arXiv

A subsampling approach for Bayesian model selection

January 31, 2022

90% Match

Jon Lachmann, Geir Storvik, ... , Hubin Aliaksadr

Methodology

Statistics Theory

Computation

Statistics Theory

It is common practice to use Laplace approximations to compute marginal likelihoods in Bayesian versions of generalised linear models (GLM). Marginal likelihoods combined with model priors are then used in different search algorithms to compute the posterior marginal probabilities of models and individual covariates. This allows performing Bayesian model selection and model averaging. For large sample sizes, even the Laplace approximation becomes computationally challenging b...

Find SimilarView on arXiv

Empirical Bayes inference in sparse high-dimensional generalized linear models

March 14, 2023

90% Match

Yiqi Tang, Ryan Martin

Statistics Theory

Methodology

Statistics Theory

High-dimensional linear models have been extensively studied in the recent literature, but the developments in high-dimensional generalized linear models, or GLMs, have been much slower. In this paper, we propose the use an empirical or data-driven prior specification leading to an empirical Bayes posterior distribution which can be used for estimation of and inference on the coefficient vector in a high-dimensional GLM, as well as for variable selection. For our proposed met...

Find SimilarView on arXiv

Fast Marginal Likelihood Estimation of the Ridge Parameter(s) in Ridge Regression and Generalized Ridge Regression for Big Data

September 8, 2014

89% Match

George Karabatsos

Methodology

Unlike the ordinary least-squares (OLS) estimator for the linear model, a ridge regression linear model provides coefficient estimates via shrinkage, usually with improved mean-square and prediction error. This is true especially when the observed design matrix is ill-conditioned or singular, either as a result of highly-correlated covariates or the number of covariates exceeding the sample size. This paper introduces novel and fast marginal maximum likelihood (MML) algorithm...

Find SimilarView on arXiv

Low-rank variational Bayes correction to the Laplace method

November 25, 2021

89% Match

Niekerk Janet van, Haavard Rue

Methodology

Machine Learning

Approximate inference methods like the Laplace method, Laplace approximations and variational methods, amongst others, are popular methods when exact inference is not feasible due to the complexity of the model or the abundance of data. In this paper we propose a hybrid approximate method called Low-Rank Variational Bayes correction (VBC), that uses the Laplace method and subsequently a Variational Bayes correction in a lower dimension, to the joint posterior mean. The cost i...

Find SimilarView on arXiv

PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference

September 26, 2017

89% Match

Jonathan H. Huggins, Ryan P. Adams, Tamara Broderick

Computation

Machine Learning

Generalized linear models (GLMs) -- such as logistic regression, Poisson regression, and robust regression -- provide interpretable models for diverse data types. Probabilistic approaches, particularly Bayesian ones, allow coherent estimates of uncertainty, incorporation of prior information, and sharing of power across experiments via hierarchical models. In practice, however, the approximate Bayesian methods necessary for inference have either failed to scale to large data ...

Find SimilarView on arXiv

Bayesian Adaptive Lasso with Variational Bayes for Variable Selection in High-dimensional Generalized Linear Mixed Models

August 30, 2016

89% Match

Dao Thanh Tung, Minh-Ngoc Tran, Tran Manh Cuong

Methodology

This article describes a full Bayesian treatment for simultaneous fixed-effect selection and parameter estimation in high-dimensional generalized linear mixed models. The approach consists of using a Bayesian adaptive Lasso penalty for signal-level adaptive shrinkage and a fast Variational Bayes scheme for estimating the posterior mode of the coefficients. The proposed approach offers several advantages over the existing methods, for example, the adaptive shrinkage parameters...

Find SimilarView on arXiv

Feasible Adjustments of Statistical Inference in High-Dimensional Generalized Linear Models

May 28, 2023

89% Match

Kazuma Sawaya, Yoshimasa Uematsu, Masaaki Imaizumi

Statistics Theory

We developed a statistical inference method applicable to a broad range of generalized linear models (GLMs) in high-dimensional settings, where the number of unknown coefficients scales proportionally with the sample size. Although a pioneering inference method has been developed for logistic regression, which is a specific instance of GLMs, it is not feasible to apply this method directly to other GLMs because of unknown hyper-parameters. In this study, we addressed this lim...

Find SimilarView on arXiv

Adaptive Randomized Dimension Reduction on Massive Data

April 13, 2015

89% Match

Gregory Darnell, Stoyan Georgiev, ... , Engelhardt Barbara E

Machine Learning

Quantitative Methods

The scalability of statistical estimators is of increasing importance in modern applications. One approach to implementing scalable algorithms is to compress data into a low dimensional latent space using dimension reduction methods. In this paper we develop an approach for dimension reduction that exploits the assumption of low rank structure in high dimensional data to gain both computational and statistical advantages. We adapt recent randomized low-rank approximation algo...

Find SimilarView on arXiv