Loading…

Nonsmooth Bilevel Programming for Hyperparameter Selection

We propose a nonsmooth bilevel programming method for training linear learning models with hyperparameters optimized via T-fold cross-validation (CV). This algorithm scales well in the sample size. The method handles loss functions with embedded maxima such as in support vector machines. Current pra...

Full description

Saved in:
Bibliographic Details
Main Authors: Moore, G.M., Bergeron, C., Bennett, K.P.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We propose a nonsmooth bilevel programming method for training linear learning models with hyperparameters optimized via T-fold cross-validation (CV). This algorithm scales well in the sample size. The method handles loss functions with embedded maxima such as in support vector machines. Current practice constructs models over a predefined grid of hyperparameter combinations and selects the best one, an inefficient heuristic. Innovating over previous bilevel CV approaches, this paper represents an advance towards the goal of self-tuning supervised data mining as well as a significant innovation in scalable bilevel programming algorithms. Using the bilevel CV formulation, the lower-level problems are treated as unconstrained optimization problems and are replaced with their optimality conditions. The resulting nonlinear program is nonsmooth and nonconvex. We develop a novel bilevel programming algorithm to solve this class of problems, and apply it to linear least-squares support vector regression having hyperparameters C (tradeoff) and e (loss insensitivity). This new approach outperforms grid search and prior smooth bilevel CV methods in terms of modeling performance. Increased speed foresees modeling with an increased number of hyperparameters.
ISSN:2375-9232
2375-9259
DOI:10.1109/ICDMW.2009.74