Loading…

On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Federated Learning

Federated learning (FL) is an emerging collaborative machine learning (ML) framework that enables training of predictive models in a distributed fashion where the communication among the participating nodes are facilitated by a central server. To deal with the communication bottleneck at the server,...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on parallel and distributed systems 2022-11, Vol.33 (11), p.2727-2739
Main Authors:	Hashemi, Abolfazl, Acharya, Anish, Das, Rudrajit, Vikalo, Haris, Sanghavi, Sujay, Dhillon, Inderjit
Format:	Article
Language:	English
Subjects:	Cognitive tasks Collaborative work Communication communication-constrained distributed optimization compressed communication Convergence decentralized learning Deep learning Federated learning Gossip Linear programming Machine learning Messages Network topologies Nodes nonconvex optimization Optimization Prediction models Servers Signal processing algorithms Task analysis Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Federated learning (FL) is an emerging collaborative machine learning (ML) framework that enables training of predictive models in a distributed fashion where the communication among the participating nodes are facilitated by a central server. To deal with the communication bottleneck at the server, decentralized FL (DFL) methods advocate rely on local communication of nodes with their neighbors according to a specific communication network. In DFL, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e., averaging over the network) steps. As the size of the ML models grows, the limited communication bandwidth among the nodes does not permit communication of full-precision messages; hence, it is becoming increasingly common to require that messages be lossy, compressed versions of the local parameters. The requirement of communicating compressed messages gives rise to the important question: given a fixed communication budget, what should be our communication strategy to minimize the (training) loss as much as possible? In this article, we explore this direction, and show that in such compressed DFL settings, there are benefits to having multiple gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for, e.g., by means of reducing the precision of compressed information. In particular, we show that having {\mathcal O}(\log \frac{1}{\epsilon }) O(log1ε) gradient iterations with constant step size - and {\mathcal O}(\log \frac{1}{\epsilon }) O(log1ε) gossip steps between every pair of these iterations - enables convergence to within \epsilon ε
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2021.3138977