Loading…

The Variational Garrote

We analyze the variational method for sparse regression using ℓ 0 regularization. The variational approximation results in a model that is similar to Breiman’s Garrote model. We refer to this method as the Variational Garrote (VG). The VG has the effect of making the problem effectively of maximal r...

Full description

Saved in:
Bibliographic Details
Published in:Machine learning 2014-09, Vol.96 (3), p.269-294
Main Authors: Kappen, Hilbert J., Gómez, Vicenç
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We analyze the variational method for sparse regression using ℓ 0 regularization. The variational approximation results in a model that is similar to Breiman’s Garrote model. We refer to this method as the Variational Garrote (VG). The VG has the effect of making the problem effectively of maximal rank even when the number of samples is small compared to the number of variables. We propose a naive mean field approximation combined with a maximum a posteriori (MAP) approach to estimate the model parameters and use an annealing and reheating schedule of the sparsity hyper-parameter to avoid local minima. The hyper-parameter is set by cross-validation. We compare the VG with the lasso, ridge regression and the recently introduced Bayesian paired mean field method (PMF) (Titsias and Lázaro-Gredilla in Advances in neural information processing systems, vol. 24, pp. 2339–2347, 2011 ). For fair comparison, we implemented a similar annealing-reheating schedule for the PMF sparsity parameter. Numerical results show that the VG and PMF yield more accurate predictions and more accurately reconstruct the true model than the other methods. The VG finds correct solutions when the lasso solution is inconsistent due to large input correlations. In the experiments that we consider we find that the VG, although based on a simpler approximation than the PMF, yields qualitatively similar or better results and is computationally more efficient. The naive implementation of the VG scales cubic with the number of features. By introducing Lagrange multipliers we obtain a dual formulation of the problem that scales cubic in the number of samples, but close to linear in the number of features.
ISSN:0885-6125
1573-0565
DOI:10.1007/s10994-013-5427-7