Loading…
Communication-Efficient Network Topology in Decentralized Learning: A Joint Design of Consensus Matrix and Resource Allocation
In decentralized machine learning over a network of workers, each worker updates its local model as a weighted average of its local model and all models received from its neighbors. Efficient consensus weight matrix design and communication resource allocation can increase the training convergence r...
Saved in:
Published in: | IEEE/ACM transactions on networking 2024-12, p.1-16 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In decentralized machine learning over a network of workers, each worker updates its local model as a weighted average of its local model and all models received from its neighbors. Efficient consensus weight matrix design and communication resource allocation can increase the training convergence rate and reduce the wall-clock training time. In this paper, we jointly consider these two factors and propose a novel algorithm termed Communication-Efficient Network Topology (CENT), which reduces the latency in each training iteration by removing unnecessary communication links. CENT enforces communication graph sparsity by iteratively updating, with a fixed step size, a trade-off factor between the convergence factor and a weighted graph sparsity. We further extend CENT to one with an adaptive step size (CENT-A), which adjusts the trade-off factor based on the feedback of the objective function value, without introducing additional computation complexity. We show that both CENT and CENT-A preserve the training convergence rate while avoiding the selection of poor communication links. Numerical studies with real-world machine learning data in both homogeneous and heterogeneous scenarios demonstrate the efficacy of CENT and CENT-A and their performance advantage over state-of-the-art algorithms. |
---|---|
ISSN: | 1063-6692 |
DOI: | 10.1109/TNET.2024.3511333 |