Loading…

Optimizing Resource Allocation in Pipeline Parallelism for Distributed DNN Training

Deep Neural Network (DNN) models have been widely deployed in a variety of applications. Driven by privacy concerns and great improvement in the computational power of mobile devices, the idea of training machine learning models on mobile devices has become more and more important. Directly applying...

Full description

Saved in:

Bibliographic Details
Main Authors:	Duan, Yubin, Wu, Jie
Format:	Conference Proceeding
Language:	English
Subjects:	Computational modeling DNN training machine learning model partition Parallel processing Performance evaluation pipeline parallelism Pipelines Privacy resource allocation Simulation Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deep Neural Network (DNN) models have been widely deployed in a variety of applications. Driven by privacy concerns and great improvement in the computational power of mobile devices, the idea of training machine learning models on mobile devices has become more and more important. Directly applying parallel training frameworks designed for data center networks to train DNN models on mobile devices may not achieve the ideal performance, since mobile devices usually have multiple types of computation resources such as ASIC, neural engine, and FPGA. Moreover, the communication time is not negligible when training on mobile devices. With the objective of minimizing DNN training time, we propose to extend the pipeline parallelism, which can hide the communication time behind computation for DNN training by integrating the resource allocation. Fine-tuning the ratio of resources allocated to forward and backward propagation can improve resource utilization. We focus on homogeneous workers and theoretically analyze the ideal cases where resources are linearly separable. We also discuss the model partition and resource allocation for a more realistic case. Additionally, we investigate the heterogeneous worker case. Trace-based simulation results show that our scheme can efficiently reduce the time cost of a training iteration.
ISSN:	2690-5965
DOI:	10.1109/ICPADS56603.2022.00029