Loading…

Communication-Efficient Network Topology in Decentralized Learning: A Joint Design of Consensus Matrix and Resource Allocation

In decentralized machine learning over a network of workers, each worker updates its local model as a weighted average of its local model and all models received from its neighbors. Efficient consensus weight matrix design and communication resource allocation can increase the training convergence r...

Full description

Saved in:
Bibliographic Details
Published in:IEEE/ACM transactions on networking 2024-12, p.1-16
Main Authors: Wang, Jingrong, Liang, Ben, Zhu, Zhongwen, Fapi, Emmanuel Thepie, Dalal, Hardik
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In decentralized machine learning over a network of workers, each worker updates its local model as a weighted average of its local model and all models received from its neighbors. Efficient consensus weight matrix design and communication resource allocation can increase the training convergence rate and reduce the wall-clock training time. In this paper, we jointly consider these two factors and propose a novel algorithm termed Communication-Efficient Network Topology (CENT), which reduces the latency in each training iteration by removing unnecessary communication links. CENT enforces communication graph sparsity by iteratively updating, with a fixed step size, a trade-off factor between the convergence factor and a weighted graph sparsity. We further extend CENT to one with an adaptive step size (CENT-A), which adjusts the trade-off factor based on the feedback of the objective function value, without introducing additional computation complexity. We show that both CENT and CENT-A preserve the training convergence rate while avoiding the selection of poor communication links. Numerical studies with real-world machine learning data in both homogeneous and heterogeneous scenarios demonstrate the efficacy of CENT and CENT-A and their performance advantage over state-of-the-art algorithms.
ISSN:1063-6692
DOI:10.1109/TNET.2024.3511333