Loading…

OCTOPUS: Overcoming Performance and Privatization Bottlenecks in Distributed Learning

The diversity and quantity of data warehouses, gathering data from distributed devices such as mobile devices, can enhance the success and robustness of machine learning algorithms. Federated learning enables distributed participants to collaboratively learn a commonly-shared model while holding dat...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on parallel and distributed systems 2022-12, Vol.33 (12), p.1-1
Main Authors:	Wang, Shuo, Nepal, Surya, Moore, Kristen, Grobler, Marthie, Rudolph, Carsten, Abuadbba, Alsharif
Format:	Article
Language:	English
Subjects:	Algorithms Communication Data Collection Data models Data sources Data warehouses Dictionaries Disentanglement Distributed databases Distributed Learning Downstream effects Electronic devices Heterogeneity Machine learning Perturbation Privatization Representation Learning Representations Servers Task analysis Training
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The diversity and quantity of data warehouses, gathering data from distributed devices such as mobile devices, can enhance the success and robustness of machine learning algorithms. Federated learning enables distributed participants to collaboratively learn a commonly-shared model while holding data locally. However, it is also faced with expensive communication and limitations due to the heterogeneity of distributed data sources and lack of access to global data. In this paper, we investigate a practical distributed learning scenario where multiple downstream tasks (e.g., classifiers) could be efficiently learned from dynamically-updated and non-iid distributed data sources and with local data privatization. We introduce a new distributed learning scheme to address communication overhead via latent compression, leveraging global data while providing privatization of local data without additional cost due to encryption or perturbation. This scheme divides learning into (1) informative feature encoding, extracting and transmitting the latent space compressed representation features of local data at each node to address communication overhead; (2) downstream tasks centralized at the server using the encoded codes gathered from each node to address computing and storage overhead. Besides, a disentanglement strategy is applied to address the privatization of sensitive components of local data. Extensive experiments are conducted on image and speech datasets. The results demonstrate that downstream tasks on the compact latent representations can achieve comparable accuracy to centralized learning with the privatization of local data.
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2022.3157258