Sparse High-Dimensional Regression: Exact Scalable Algorithms and Phase Transitions

September 28, 2017

Dimitris Bertsimas, Parys Bart Van

Mathematics

Statistics

Optimization and Control

Machine Learning

We present a novel binary convex reformulation of the sparse regression problem that constitutes a new duality perspective. We devise a new cutting plane method and provide evidence that it can solve to provable optimality the sparse regression problem for sample sizes n and number of regressors p in the 100,000s, that is two orders of magnitude better than the current state of the art, in seconds. The ability to solve the problem for very high dimensions allows us to observe new phase transition phenomena. Contrary to traditional complexity theory which suggests that the difficulty of a problem increases with problem size, the sparse regression problem has the property that as the number of samples $n$ increases the problem becomes easier in that the solution recovers 100% of the true signal, and our approach solves the problem extremely fast (in fact faster than Lasso), while for small number of samples n, our approach takes a larger amount of time to solve the problem, but importantly the optimal solution provides a statistically more relevant regressor. We argue that our exact sparse regression approach presents a superior alternative over heuristic methods available at present.

Sparse Classification: a scalable discrete optimization perspective

October 3, 2017

92% Match

Dimitris Bertsimas, Jean Pauphilet, Parys Bart Van

Optimization and Control

We formulate the sparse classification problem of $n$ samples with $p$ features as a binary convex optimization problem and propose a cutting-plane algorithm to solve it exactly. For sparse logistic regression and sparse SVM, our algorithm finds optimal solutions for $n$ and $p$ in the $10,000$s within minutes. On synthetic data our algorithm achieves perfect support recovery in the large sample regime. Namely, there exists a $n_0$ such that the algorithm takes a long time to...

Find SimilarView on arXiv

Sparse Hierarchical Regression with Polynomials

September 28, 2017

92% Match

Dimitris Bertsimas, Parys Bart Van

Optimization and Control

Machine Learning

We present a novel method for exact hierarchical sparse polynomial regression. Our regressor is that degree $r$ polynomial which depends on at most $k$ inputs, counting at most $\ell$ monomial terms, which minimizes the sum of the squares of its prediction errors. The previous hierarchical sparse specification aligns well with modern big data settings where many inputs are not relevant for prediction purposes and the functional complexity of the regressor needs to be controll...

Find SimilarView on arXiv

Sparse Regression at Scale: Branch-and-Bound rooted in First-Order Optimization

April 13, 2020

91% Match

Hussein Hazimeh, Rahul Mazumder, Ali Saab

Computation

Machine Learning

Optimization and Control

Machine Learning

We consider the least squares regression problem, penalized with a combination of the $\ell_{0}$ and squared $\ell_{2}$ penalty functions (a.k.a. $\ell_0 \ell_2$ regularization). Recent work shows that the resulting estimators are of key importance in many high-dimensional statistical settings. However, exact computation of these estimators remains a major challenge. Indeed, modern exact methods, based on mixed integer programming (MIP), face difficulties when the number of f...

Find SimilarView on arXiv

High-Dimensional Regression with Binary Coefficients. Estimating Squared Error and a Phase Transition

January 16, 2017

91% Match

David Gamarnik, Ilias Zadik

Machine Learning

Probability

Statistics Theory

We consider a sparse linear regression model Y=X\beta^{*}+W where X has a Gaussian entries, W is the noise vector with mean zero Gaussian entries, and \beta^{*} is a binary vector with support size (sparsity) k. Using a novel conditional second moment method we obtain a tight up to a multiplicative constant approximation of the optimal squared error \min_{\beta}\|Y-X\beta\|_{2}, where the minimization is over all k-sparse binary vectors \beta. The approximation reveals intere...

Find SimilarView on arXiv

High Dimensional Multivariate Regression and Precision Matrix Estimation via Nonconvex Optimization

June 2, 2016

91% Match

Jinghui Chen, Quanquan Gu

Machine Learning

We propose a nonconvex estimator for joint multivariate regression and precision matrix estimation in the high dimensional regime, under sparsity constraints. A gradient descent algorithm with hard thresholding is developed to solve the nonconvex estimator, and it attains a linear rate of convergence to the true regression coefficients and precision matrix simultaneously, up to the statistical error. Compared with existing methods along this line of research, which have littl...

Find SimilarView on arXiv

Scaled minimax optimality in high-dimensional linear regression: A non-convex algorithmic regularization approach

August 27, 2020

90% Match

Mohamed Ndaoud

Statistics Theory

The question of fast convergence in the classical problem of high dimensional linear regression has been extensively studied. Arguably, one of the fastest procedures in practice is Iterative Hard Thresholding (IHT). Still, IHT relies strongly on the knowledge of the true sparsity parameter $s$. In this paper, we present a novel fast procedure for estimation in the high dimensional linear regression. Taking advantage of the interplay between estimation, support recovery and op...

Find SimilarView on arXiv

Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm

November 14, 2017

90% Match

David Gamarnik, Ilias Zadik

Statistics Theory

Probability

Machine Learning

Statistics Theory

We consider a sparse high dimensional regression model where the goal is to recover a $k$-sparse unknown vector $\beta^*$ from $n$ noisy linear observations of the form $Y=X\beta^*+W \in \mathbb{R}^n$ where $X \in \mathbb{R}^{n \times p}$ has iid $N(0,1)$ entries and $W \in \mathbb{R}^n$ has iid $N(0,\sigma^2)$ entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between the minimum value of $n$, call it $n^*$, for which the recovery is ...

Find SimilarView on arXiv

A Constructive Approach to High-dimensional Regression

January 18, 2017

90% Match

Jian Huang, Yuling Jiao, ... , Lu Xiliang

Computation

Methodology

We develop a constructive approach to estimating sparse, high-dimensional linear regression models. The approach is a computational algorithm motivated from the KKT conditions for the $\ell_0$-penalized least squares solutions. It generates a sequence of solutions iteratively, based on support detection using primal and dual information and root finding. We refer to the algorithm as SDAR for brevity. Under a sparse Rieze condition on the design matrix and certain other condit...

Find SimilarView on arXiv

Sparse Regression: Scalable algorithms and empirical performance

February 18, 2019

90% Match

Dimitris Bertsimas, Jean Pauphilet, Parys Bart Van

Methodology

In this paper, we review state-of-the-art methods for feature selection in statistics with an application-oriented eye. Indeed, sparsity is a valuable property and the profusion of research on the topic might have provided little guidance to practitioners. We demonstrate empirically how noise and correlation impact both the accuracy - the number of correct features selected - and the false detection - the number of incorrect features selected - for five methods: the cardinali...

Find SimilarView on arXiv

Sparse linear regression -- CLuP achieves the ideal \emph{exact} ML

November 23, 2020

90% Match

Mihailo Stojnic

Information Theory

Statistics Theory

In this paper we revisit one of the classical statistical problems, the so-called sparse maximum-likelihood (ML) linear regression. As a way of attacking this type of regression, we present a novel CLuP mechanism that to a degree relies on the \bl{\textbf{Random Duality Theory (RDT)}} based algorithmic machinery that we recently introduced in \cite{Stojnicclupint19,Stojnicclupcmpl19,Stojnicclupplt19,Stojniccluplargesc20,Stojniccluprephased20}. After the initial success that t...

Find SimilarView on arXiv