Loading…

Model Parameter Prediction Method for Accelerating Distributed DNN Training

As the size of deep neural network (DNN) models and datasets increases, distributed training becomes popular to reduce the training time. However, a severe communication bottleneck in distributed training limits its scalability. Many methods aim to address this communication bottleneck by reducing c...

Full description

Saved in:

Bibliographic Details
Published in:	Computer networks (Amsterdam, Netherlands : 1999) Netherlands : 1999), 2024-12, Vol.255, p.110883, Article 110883
Main Authors:	Liu, Wai-xi, Chen, Dao-xiao, Tan, Miao-quan, Chen, Kong-yang, Yin, Yue, Shang, Wen-Li, Li, Jin, Cai, Jun
Format:	Article
Language:	English
Subjects:	Communication optimization Distributed training Parameter prediction
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites	cdi_FETCH-LOGICAL-c185t-661596fdf389b2978700c4e905e75a9bb8a3df0c0d8e09f65b23c9cd1a33432c3
container_end_page
container_issue
container_start_page	110883
container_title	Computer networks (Amsterdam, Netherlands : 1999)
container_volume	255
creator	Liu, Wai-xi Chen, Dao-xiao Tan, Miao-quan Chen, Kong-yang Yin, Yue Shang, Wen-Li Li, Jin Cai, Jun
description	As the size of deep neural network (DNN) models and datasets increases, distributed training becomes popular to reduce the training time. However, a severe communication bottleneck in distributed training limits its scalability. Many methods aim to address this communication bottleneck by reducing communication traffic, such as gradient sparsification and quantization. However, these methods either are at the expense of losing model accuracy or introducing lots of computing overhead. We have observed that the data distribution between layers of neural network models is similar. Thus, we propose a model parameter prediction method (MP2) to accelerate distributed DNN training under parameter server (PS) framework, where workers push only a subset of model parameters to the PS, and residual model parameters are locally predicted by an already-trained deep neural network model on the PS. We address several key challenges in this approach. First, we build a hierarchical parameters dataset by randomly sampling a subset of model from normal distributed trainings. Second, we design a neural network model with the structure of “convolution + channel attention + Max pooling” for predicting model parameters by using a prediction result-based evaluation method. For VGGNet, ResNet, and AlexNet models on CIFAR10 and CIFAR100 datasets, compared with Baseline, Top-k, deep gradient compression (DGC), and weight nowcaster network (WNN), MP2 can reduce traffic by up to 88.98%; and accelerates the training by up to 47.32% while not losing the model accuracy. MP2 has shown good generalization.
doi_str_mv	10.1016/j.comnet.2024.110883
format	article
fullrecord	<record><control><sourceid>elsevier_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1016_j_comnet_2024_110883</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1389128624007151</els_id><sourcerecordid>S1389128624007151</sourcerecordid><originalsourceid>FETCH-LOGICAL-c185t-661596fdf389b2978700c4e905e75a9bb8a3df0c0d8e09f65b23c9cd1a33432c3</originalsourceid><addsrcrecordid>eNp9kL9OwzAYxD2ARCm8AYNfIOGznaTOglS1_BNt6VBmy7E_g6s2RrZB4u1JFWamG053uvsRcsOgZMCa231pwrHHXHLgVckYSCnOyIQJ2RaMy-aCXKa0B4Cq4nJCXtbB4oFuddRHzBjpNqL1JvvQ0zXmj2CpC5HOjcEDRp19_06XPuXou6-Mli43G7qL2veDcUXOnT4kvP7TKXl7uN8tnorV6-PzYr4qDJN1LpqG1W3jrBsmdbydyRmAqbCFGme1brtOamEdGLASoXVN3XFhWmOZFqIS3IgpqcZeE0NKEZ36jP6o449ioE4Q1F6NENQJghohDLG7MYbDtm-PUSXjsTfD34gmKxv8_wW_Zotpgg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Model Parameter Prediction Method for Accelerating Distributed DNN Training</title><source>Elsevier</source><creator>Liu, Wai-xi ; Chen, Dao-xiao ; Tan, Miao-quan ; Chen, Kong-yang ; Yin, Yue ; Shang, Wen-Li ; Li, Jin ; Cai, Jun</creator><creatorcontrib>Liu, Wai-xi ; Chen, Dao-xiao ; Tan, Miao-quan ; Chen, Kong-yang ; Yin, Yue ; Shang, Wen-Li ; Li, Jin ; Cai, Jun</creatorcontrib><description>As the size of deep neural network (DNN) models and datasets increases, distributed training becomes popular to reduce the training time. However, a severe communication bottleneck in distributed training limits its scalability. Many methods aim to address this communication bottleneck by reducing communication traffic, such as gradient sparsification and quantization. However, these methods either are at the expense of losing model accuracy or introducing lots of computing overhead. We have observed that the data distribution between layers of neural network models is similar. Thus, we propose a model parameter prediction method (MP2) to accelerate distributed DNN training under parameter server (PS) framework, where workers push only a subset of model parameters to the PS, and residual model parameters are locally predicted by an already-trained deep neural network model on the PS. We address several key challenges in this approach. First, we build a hierarchical parameters dataset by randomly sampling a subset of model from normal distributed trainings. Second, we design a neural network model with the structure of “convolution + channel attention + Max pooling” for predicting model parameters by using a prediction result-based evaluation method. For VGGNet, ResNet, and AlexNet models on CIFAR10 and CIFAR100 datasets, compared with Baseline, Top-k, deep gradient compression (DGC), and weight nowcaster network (WNN), MP2 can reduce traffic by up to 88.98%; and accelerates the training by up to 47.32% while not losing the model accuracy. MP2 has shown good generalization.</description><identifier>ISSN: 1389-1286</identifier><identifier>DOI: 10.1016/j.comnet.2024.110883</identifier><language>eng</language><publisher>Elsevier B.V</publisher><subject>Communication optimization ; Distributed training ; Parameter prediction</subject><ispartof>Computer networks (Amsterdam, Netherlands : 1999), 2024-12, Vol.255, p.110883, Article 110883</ispartof><rights>2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c185t-661596fdf389b2978700c4e905e75a9bb8a3df0c0d8e09f65b23c9cd1a33432c3</cites><orcidid>0000-0003-2439-3518 ; 0000-0002-7343-4948</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27923,27924</link.rule.ids></links><search><creatorcontrib>Liu, Wai-xi</creatorcontrib><creatorcontrib>Chen, Dao-xiao</creatorcontrib><creatorcontrib>Tan, Miao-quan</creatorcontrib><creatorcontrib>Chen, Kong-yang</creatorcontrib><creatorcontrib>Yin, Yue</creatorcontrib><creatorcontrib>Shang, Wen-Li</creatorcontrib><creatorcontrib>Li, Jin</creatorcontrib><creatorcontrib>Cai, Jun</creatorcontrib><title>Model Parameter Prediction Method for Accelerating Distributed DNN Training</title><title>Computer networks (Amsterdam, Netherlands : 1999)</title><description>As the size of deep neural network (DNN) models and datasets increases, distributed training becomes popular to reduce the training time. However, a severe communication bottleneck in distributed training limits its scalability. Many methods aim to address this communication bottleneck by reducing communication traffic, such as gradient sparsification and quantization. However, these methods either are at the expense of losing model accuracy or introducing lots of computing overhead. We have observed that the data distribution between layers of neural network models is similar. Thus, we propose a model parameter prediction method (MP2) to accelerate distributed DNN training under parameter server (PS) framework, where workers push only a subset of model parameters to the PS, and residual model parameters are locally predicted by an already-trained deep neural network model on the PS. We address several key challenges in this approach. First, we build a hierarchical parameters dataset by randomly sampling a subset of model from normal distributed trainings. Second, we design a neural network model with the structure of “convolution + channel attention + Max pooling” for predicting model parameters by using a prediction result-based evaluation method. For VGGNet, ResNet, and AlexNet models on CIFAR10 and CIFAR100 datasets, compared with Baseline, Top-k, deep gradient compression (DGC), and weight nowcaster network (WNN), MP2 can reduce traffic by up to 88.98%; and accelerates the training by up to 47.32% while not losing the model accuracy. MP2 has shown good generalization.</description><subject>Communication optimization</subject><subject>Distributed training</subject><subject>Parameter prediction</subject><issn>1389-1286</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9kL9OwzAYxD2ARCm8AYNfIOGznaTOglS1_BNt6VBmy7E_g6s2RrZB4u1JFWamG053uvsRcsOgZMCa231pwrHHXHLgVckYSCnOyIQJ2RaMy-aCXKa0B4Cq4nJCXtbB4oFuddRHzBjpNqL1JvvQ0zXmj2CpC5HOjcEDRp19_06XPuXou6-Mli43G7qL2veDcUXOnT4kvP7TKXl7uN8tnorV6-PzYr4qDJN1LpqG1W3jrBsmdbydyRmAqbCFGme1brtOamEdGLASoXVN3XFhWmOZFqIS3IgpqcZeE0NKEZ36jP6o449ioE4Q1F6NENQJghohDLG7MYbDtm-PUSXjsTfD34gmKxv8_wW_Zotpgg</recordid><startdate>202412</startdate><enddate>202412</enddate><creator>Liu, Wai-xi</creator><creator>Chen, Dao-xiao</creator><creator>Tan, Miao-quan</creator><creator>Chen, Kong-yang</creator><creator>Yin, Yue</creator><creator>Shang, Wen-Li</creator><creator>Li, Jin</creator><creator>Cai, Jun</creator><general>Elsevier B.V</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0003-2439-3518</orcidid><orcidid>https://orcid.org/0000-0002-7343-4948</orcidid></search><sort><creationdate>202412</creationdate><title>Model Parameter Prediction Method for Accelerating Distributed DNN Training</title><author>Liu, Wai-xi ; Chen, Dao-xiao ; Tan, Miao-quan ; Chen, Kong-yang ; Yin, Yue ; Shang, Wen-Li ; Li, Jin ; Cai, Jun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c185t-661596fdf389b2978700c4e905e75a9bb8a3df0c0d8e09f65b23c9cd1a33432c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Communication optimization</topic><topic>Distributed training</topic><topic>Parameter prediction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Liu, Wai-xi</creatorcontrib><creatorcontrib>Chen, Dao-xiao</creatorcontrib><creatorcontrib>Tan, Miao-quan</creatorcontrib><creatorcontrib>Chen, Kong-yang</creatorcontrib><creatorcontrib>Yin, Yue</creatorcontrib><creatorcontrib>Shang, Wen-Li</creatorcontrib><creatorcontrib>Li, Jin</creatorcontrib><creatorcontrib>Cai, Jun</creatorcontrib><collection>CrossRef</collection><jtitle>Computer networks (Amsterdam, Netherlands : 1999)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Wai-xi</au><au>Chen, Dao-xiao</au><au>Tan, Miao-quan</au><au>Chen, Kong-yang</au><au>Yin, Yue</au><au>Shang, Wen-Li</au><au>Li, Jin</au><au>Cai, Jun</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Model Parameter Prediction Method for Accelerating Distributed DNN Training</atitle><jtitle>Computer networks (Amsterdam, Netherlands : 1999)</jtitle><date>2024-12</date><risdate>2024</risdate><volume>255</volume><spage>110883</spage><pages>110883-</pages><artnum>110883</artnum><issn>1389-1286</issn><abstract>As the size of deep neural network (DNN) models and datasets increases, distributed training becomes popular to reduce the training time. However, a severe communication bottleneck in distributed training limits its scalability. Many methods aim to address this communication bottleneck by reducing communication traffic, such as gradient sparsification and quantization. However, these methods either are at the expense of losing model accuracy or introducing lots of computing overhead. We have observed that the data distribution between layers of neural network models is similar. Thus, we propose a model parameter prediction method (MP2) to accelerate distributed DNN training under parameter server (PS) framework, where workers push only a subset of model parameters to the PS, and residual model parameters are locally predicted by an already-trained deep neural network model on the PS. We address several key challenges in this approach. First, we build a hierarchical parameters dataset by randomly sampling a subset of model from normal distributed trainings. Second, we design a neural network model with the structure of “convolution + channel attention + Max pooling” for predicting model parameters by using a prediction result-based evaluation method. For VGGNet, ResNet, and AlexNet models on CIFAR10 and CIFAR100 datasets, compared with Baseline, Top-k, deep gradient compression (DGC), and weight nowcaster network (WNN), MP2 can reduce traffic by up to 88.98%; and accelerates the training by up to 47.32% while not losing the model accuracy. MP2 has shown good generalization.</abstract><pub>Elsevier B.V</pub><doi>10.1016/j.comnet.2024.110883</doi><orcidid>https://orcid.org/0000-0003-2439-3518</orcidid><orcidid>https://orcid.org/0000-0002-7343-4948</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1389-1286
ispartof	Computer networks (Amsterdam, Netherlands : 1999), 2024-12, Vol.255, p.110883, Article 110883
issn	1389-1286
language	eng
recordid	cdi_crossref_primary_10_1016_j_comnet_2024_110883
source	Elsevier
subjects	Communication optimization Distributed training Parameter prediction
title	Model Parameter Prediction Method for Accelerating Distributed DNN Training
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T20%3A51%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Model%20Parameter%20Prediction%20Method%20for%20Accelerating%20Distributed%20DNN%20Training&rft.jtitle=Computer%20networks%20(Amsterdam,%20Netherlands%20:%201999)&rft.au=Liu,%20Wai-xi&rft.date=2024-12&rft.volume=255&rft.spage=110883&rft.pages=110883-&rft.artnum=110883&rft.issn=1389-1286&rft_id=info:doi/10.1016/j.comnet.2024.110883&rft_dat=%3Celsevier_cross%3ES1389128624007151%3C/elsevier_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c185t-661596fdf389b2978700c4e905e75a9bb8a3df0c0d8e09f65b23c9cd1a33432c3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true