Loading…
The Asymptotic Distribution of the Stochastic Mirror Descent Iterates in Linear Models
The stochastic mirror descent (SMD) algorithm is a generalization of the celebrated stochastic gradient descent (SGD)-the workhorse of modern machine learning. In over-parameterized models, such as in deep learning, it is well known that SGD finds the weight vector that interpolates the training dat...
Saved in:
Main Authors: | , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The stochastic mirror descent (SMD) algorithm is a generalization of the celebrated stochastic gradient descent (SGD)-the workhorse of modern machine learning. In over-parameterized models, such as in deep learning, it is well known that SGD finds the weight vector that interpolates the training data and is closest to the initial weight vector in Euclidean distance. This phenomenon is called "implicit regularization". SMD allows for different implicit regularizations and finds the interpolating solution that is closest to the initial weight vector in Bregman divergence (corresponding to the mirror's potential function). It has been empirically observed that different potentials lead to different generalization errors and different distributions of the weights. In this paper, we explicitly compute the asymptotic distribution of the optimal weight vector for SMD applied to over-parameterized linear models with Gaussian training vectors. The theory presented well matches empirical simulations and can provide a stepping stone toward the analysis of SMD on nonlinear models, such as in deep learning. |
---|---|
ISSN: | 2157-8117 |
DOI: | 10.1109/ISIT54713.2023.10206966 |