Loading…

Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks

Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on automatic control 2024-10, p.1-15
Main Authors:	Chen, Yiyue, Hashemi, Abolfazl, Vikalo, Haris
Format:	Article
Language:	English
Subjects:	Complexity theory Computational modeling Convergence decentralized non-convex optimization Directed graphs Distance learning Heuristic algorithms Linear programming Optimization Signal processing algorithms stochastic non-convex optimization Stochastic processes time-varying directed network
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	15
container_issue
container_start_page	1
container_title	IEEE transactions on automatic control
container_volume
creator	Chen, Yiyue Hashemi, Abolfazl Vikalo, Haris
description	Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed non-convex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems which communicate gradient acceleration components. We prove that the algorithm's oracle complexity is \mathcal {O}(1/\epsilon ^{1.5})and that under Polyak-Łojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a non-convex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.
doi_str_mv	10.1109/TAC.2024.3479888
format	article
fullrecord	<record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TAC_2024_3479888</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10715643</ieee_id><sourcerecordid>10_1109_TAC_2024_3479888</sourcerecordid><originalsourceid>FETCH-LOGICAL-c623-5419f8e6c070b93d1d317a299c3bc7779e155378c3b95b696942dc67ba1b6de3</originalsourceid><addsrcrecordid>eNpNkElPwzAQhS0EEqVw58Ahf8DFS-LlWIVVqtpDK45EjjMFQxtXtll_PY7aA6eZJ817mvchdEnJhFKir1fTesIIKye8lFopdYRGtKoUZhXjx2hECFVYMyVO0VmMb1mKsqQj9Dy1FjYQTIKuuHExBdd-DPsyeftqYnK2mPse177_hO9isUtu635Ncr4v_CeEYuW2gJ9M-HH9Sw4IYAf3HNKXD-_xHJ2szSbCxWGO0fLudlU_4Nni_rGezrAVjOOqpHqtQFgiSat5RztOpWFaW95aKaWGXIVLlaWuWqGFLllnhWwNbUUHfIzIPtUGH2OAdbMLbpt_aihpBjpNptMMdJoDnWy52lscAPw7l7QSJed_XzFh9Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Chen, Yiyue ; Hashemi, Abolfazl ; Vikalo, Haris</creator><creatorcontrib>Chen, Yiyue ; Hashemi, Abolfazl ; Vikalo, Haris</creatorcontrib><description>Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed non-convex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems which communicate gradient acceleration components. We prove that the algorithm's oracle complexity is <inline-formula><tex-math notation="LaTeX">\mathcal {O}(1/\epsilon ^{1.5})</tex-math></inline-formula>and that under Polyak-Łojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a non-convex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2024.3479888</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Complexity theory ; Computational modeling ; Convergence ; decentralized non-convex optimization ; Directed graphs ; Distance learning ; Heuristic algorithms ; Linear programming ; Optimization ; Signal processing algorithms ; stochastic non-convex optimization ; Stochastic processes ; time-varying directed network</subject><ispartof>IEEE transactions on automatic control, 2024-10, p.1-15</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10715643$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27923,27924,54795</link.rule.ids></links><search><creatorcontrib>Chen, Yiyue</creatorcontrib><creatorcontrib>Hashemi, Abolfazl</creatorcontrib><creatorcontrib>Vikalo, Haris</creatorcontrib><title>Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed non-convex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems which communicate gradient acceleration components. We prove that the algorithm's oracle complexity is <inline-formula><tex-math notation="LaTeX">\mathcal {O}(1/\epsilon ^{1.5})</tex-math></inline-formula>and that under Polyak-Łojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a non-convex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.</description><subject>Complexity theory</subject><subject>Computational modeling</subject><subject>Convergence</subject><subject>decentralized non-convex optimization</subject><subject>Directed graphs</subject><subject>Distance learning</subject><subject>Heuristic algorithms</subject><subject>Linear programming</subject><subject>Optimization</subject><subject>Signal processing algorithms</subject><subject>stochastic non-convex optimization</subject><subject>Stochastic processes</subject><subject>time-varying directed network</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkElPwzAQhS0EEqVw58Ahf8DFS-LlWIVVqtpDK45EjjMFQxtXtll_PY7aA6eZJ817mvchdEnJhFKir1fTesIIKye8lFopdYRGtKoUZhXjx2hECFVYMyVO0VmMb1mKsqQj9Dy1FjYQTIKuuHExBdd-DPsyeftqYnK2mPse177_hO9isUtu635Ncr4v_CeEYuW2gJ9M-HH9Sw4IYAf3HNKXD-_xHJ2szSbCxWGO0fLudlU_4Nni_rGezrAVjOOqpHqtQFgiSat5RztOpWFaW95aKaWGXIVLlaWuWqGFLllnhWwNbUUHfIzIPtUGH2OAdbMLbpt_aihpBjpNptMMdJoDnWy52lscAPw7l7QSJed_XzFh9Q</recordid><startdate>20241011</startdate><enddate>20241011</enddate><creator>Chen, Yiyue</creator><creator>Hashemi, Abolfazl</creator><creator>Vikalo, Haris</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20241011</creationdate><title>Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks</title><author>Chen, Yiyue ; Hashemi, Abolfazl ; Vikalo, Haris</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c623-5419f8e6c070b93d1d317a299c3bc7779e155378c3b95b696942dc67ba1b6de3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Complexity theory</topic><topic>Computational modeling</topic><topic>Convergence</topic><topic>decentralized non-convex optimization</topic><topic>Directed graphs</topic><topic>Distance learning</topic><topic>Heuristic algorithms</topic><topic>Linear programming</topic><topic>Optimization</topic><topic>Signal processing algorithms</topic><topic>stochastic non-convex optimization</topic><topic>Stochastic processes</topic><topic>time-varying directed network</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yiyue</creatorcontrib><creatorcontrib>Hashemi, Abolfazl</creatorcontrib><creatorcontrib>Vikalo, Haris</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Yiyue</au><au>Hashemi, Abolfazl</au><au>Vikalo, Haris</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2024-10-11</date><risdate>2024</risdate><spage>1</spage><epage>15</epage><pages>1-15</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>Distributed stochastic non-convex optimization problems have recently received attention due to the growing interest of signal processing, computer vision, and natural language processing communities in applications deployed over distributed learning systems (e.g., federated learning). We study the setting where the data is distributed across the nodes of a time-varying directed network, a topology suitable for modeling dynamic networks experiencing communication delays and straggler effects. The network nodes, which can access only their local objectives and query a stochastic first-order oracle to obtain gradient estimates, collaborate to minimize a global objective function by exchanging messages with their neighbors. We propose an algorithm, novel to this setting, that leverages stochastic gradient descent with momentum and gradient tracking to solve distributed non-convex optimization problems over time-varying networks. To analyze the algorithm, we tackle the challenges that arise when analyzing dynamic network systems which communicate gradient acceleration components. We prove that the algorithm's oracle complexity is <inline-formula><tex-math notation="LaTeX">\mathcal {O}(1/\epsilon ^{1.5})</tex-math></inline-formula>and that under Polyak-Łojasiewicz condition the algorithm converges linearly to a steady error state. The proposed scheme is tested on several learning tasks: a non-convex logistic regression experiment on the MNIST dataset, an image classification task on the CIFAR-10 dataset, and an NLP classification test on the IMDB dataset. We further present numerical simulations with an objective that satisfies the PL condition. The results demonstrate superior performance of the proposed framework compared to the existing related methods.</abstract><pub>IEEE</pub><doi>10.1109/TAC.2024.3479888</doi><tpages>15</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0018-9286
ispartof	IEEE transactions on automatic control, 2024-10, p.1-15
issn	0018-9286 1558-2523
language	eng
recordid	cdi_crossref_primary_10_1109_TAC_2024_3479888
source	IEEE Electronic Library (IEL) Journals
subjects	Complexity theory Computational modeling Convergence decentralized non-convex optimization Directed graphs Distance learning Heuristic algorithms Linear programming Optimization Signal processing algorithms stochastic non-convex optimization Stochastic processes time-varying directed network
title	Accelerated Distributed Stochastic Non-Convex Optimization over Time-Varying Directed Networks
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T15%3A20%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accelerated%20Distributed%20Stochastic%20Non-Convex%20Optimization%20over%20Time-Varying%20Directed%20Networks&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Chen,%20Yiyue&rft.date=2024-10-11&rft.spage=1&rft.epage=15&rft.pages=1-15&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2024.3479888&rft_dat=%3Ccrossref_ieee_%3E10_1109_TAC_2024_3479888%3C/crossref_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c623-5419f8e6c070b93d1d317a299c3bc7779e155378c3b95b696942dc67ba1b6de3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10715643&rfr_iscdi=true