Loading…

The Asymptotic Distribution of the Stochastic Mirror Descent Iterates in Linear Models

The stochastic mirror descent (SMD) algorithm is a generalization of the celebrated stochastic gradient descent (SGD)-the workhorse of modern machine learning. In over-parameterized models, such as in deep learning, it is well known that SGD finds the weight vector that interpolates the training dat...

Full description

Saved in:
Bibliographic Details
Main Authors: Varma, K Nithin, Lale, Sahin, Hassibi, Babak
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The stochastic mirror descent (SMD) algorithm is a generalization of the celebrated stochastic gradient descent (SGD)-the workhorse of modern machine learning. In over-parameterized models, such as in deep learning, it is well known that SGD finds the weight vector that interpolates the training data and is closest to the initial weight vector in Euclidean distance. This phenomenon is called "implicit regularization". SMD allows for different implicit regularizations and finds the interpolating solution that is closest to the initial weight vector in Bregman divergence (corresponding to the mirror's potential function). It has been empirically observed that different potentials lead to different generalization errors and different distributions of the weights. In this paper, we explicitly compute the asymptotic distribution of the optimal weight vector for SMD applied to over-parameterized linear models with Gaussian training vectors. The theory presented well matches empirical simulations and can provide a stepping stone toward the analysis of SMD on nonlinear models, such as in deep learning.
ISSN:2157-8117
DOI:10.1109/ISIT54713.2023.10206966