Loading…

An Improved Analysis of Gradient Tracking for Decentralized Machine Learning

We consider decentralized machine learning over a network where the training data is distributed across \(n\) agents, each of which can compute stochastic model updates on their local data. The agent's common goal is to find a model that minimizes the average of all local loss functions. While...

Full description

Saved in:
Bibliographic Details
Published in:arXiv.org 2022-02
Main Authors: Koloskova, Anastasia, Lin, Tao, Stich, Sebastian U
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We consider decentralized machine learning over a network where the training data is distributed across \(n\) agents, each of which can compute stochastic model updates on their local data. The agent's common goal is to find a model that minimizes the average of all local loss functions. While gradient tracking (GT) algorithms can overcome a key challenge, namely accounting for differences between workers' local data distributions, the known convergence rates for GT algorithms are not optimal with respect to their dependence on the mixing parameter \(p\) (related to the spectral gap of the connectivity matrix). We provide a tighter analysis of the GT method in the stochastic strongly convex, convex and non-convex settings. We improve the dependency on \(p\) from \(\mathcal{O}(p^{-2})\) to \(\mathcal{O}(p^{-1}c^{-1})\) in the noiseless case and from \(\mathcal{O}(p^{-3/2})\) to \(\mathcal{O}(p^{-1/2}c^{-1})\) in the general stochastic case, where \(c \geq p\) is related to the negative eigenvalues of the connectivity matrix (and is a constant in most practical applications). This improvement was possible due to a new proof technique which could be of independent interest.
ISSN:2331-8422