Loading…

Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration

Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Ov...

Full description

Saved in:
Bibliographic Details
Published in:Journal of optical communications and networking 2024-06, Vol.16 (6), p.644-658
Main Authors: Qin, Liang, Gu, Huaxi, Yu, Xiaoshan, Cai, Zheyi, Liu, Junchen
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273
container_end_page 658
container_issue 6
container_start_page 644
container_title Journal of optical communications and networking
container_volume 16
creator Qin, Liang
Gu, Huaxi
Yu, Xiaoshan
Cai, Zheyi
Liu, Junchen
description Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least {3} \times and enhances throughput by up to 60%.
doi_str_mv 10.1364/JOCN.516031
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1364_JOCN_516031</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10536144</ieee_id><sourcerecordid>3058294082</sourcerecordid><originalsourceid>FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273</originalsourceid><addsrcrecordid>eNpNkDFPwzAQRi0EEqUwsTJYYkQpduzYMRuKgIIqygALS-Sm58Sl2MFxhPrvSRWEmO6G9919egidUzKjTPDrp2XxPMuoIIweoAlVnCVEMHX4t6fkGJ103YYQISnNJuh9GarGrm8wuEa7yroaz18KbF2EUHnnoIrWO-wgfvvw0eHYBN_XzQCYAF89uIijb_3W1zscYEgYW_dB70On6MjobQdnv3OK3u7vXot5slg-PBa3i6SiksREGSGU0MA5FTzNc7mWOlfUVEYqrplaAdGSQw5KDqU1VTrVsMpEDoZrnUo2RZfj3Tb4oVEXy43vgxteloxkeao4ydOBuhqpKviuC2DKNthPHXYlJeVeXrmXV47yBvpipC0A_CMzJijn7AdrO2v9</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3058294082</pqid></control><display><type>article</type><title>Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration</title><source>IEEE Electronic Library (IEL) Journals</source><source>Jisc-Optica Publishing Group Read &amp; Publish Agreement 2022-2024 – E Combination 1</source><creator>Qin, Liang ; Gu, Huaxi ; Yu, Xiaoshan ; Cai, Zheyi ; Liu, Junchen</creator><creatorcontrib>Qin, Liang ; Gu, Huaxi ; Yu, Xiaoshan ; Cai, Zheyi ; Liu, Junchen</creatorcontrib><description>Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least {3} \times and enhances throughput by up to 60%.</description><identifier>ISSN: 1943-0620</identifier><identifier>EISSN: 1943-0639</identifier><identifier>DOI: 10.1364/JOCN.516031</identifier><identifier>CODEN: JOCNBB</identifier><language>eng</language><publisher>Piscataway: Optica Publishing Group</publisher><subject>Bandwidths ; Computation ; Configuration management ; Correlation ; Delays ; Multiprocessor interconnection ; Network topologies ; Network topology ; Optical switches ; Reconfiguration ; Routing ; Topology ; Traffic congestion</subject><ispartof>Journal of optical communications and networking, 2024-06, Vol.16 (6), p.644-658</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10536144$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Qin, Liang</creatorcontrib><creatorcontrib>Gu, Huaxi</creatorcontrib><creatorcontrib>Yu, Xiaoshan</creatorcontrib><creatorcontrib>Cai, Zheyi</creatorcontrib><creatorcontrib>Liu, Junchen</creatorcontrib><title>Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration</title><title>Journal of optical communications and networking</title><addtitle>jocn</addtitle><description>Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least {3} \times and enhances throughput by up to 60%.</description><subject>Bandwidths</subject><subject>Computation</subject><subject>Configuration management</subject><subject>Correlation</subject><subject>Delays</subject><subject>Multiprocessor interconnection</subject><subject>Network topologies</subject><subject>Network topology</subject><subject>Optical switches</subject><subject>Reconfiguration</subject><subject>Routing</subject><subject>Topology</subject><subject>Traffic congestion</subject><issn>1943-0620</issn><issn>1943-0639</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkDFPwzAQRi0EEqUwsTJYYkQpduzYMRuKgIIqygALS-Sm58Sl2MFxhPrvSRWEmO6G9919egidUzKjTPDrp2XxPMuoIIweoAlVnCVEMHX4t6fkGJ103YYQISnNJuh9GarGrm8wuEa7yroaz18KbF2EUHnnoIrWO-wgfvvw0eHYBN_XzQCYAF89uIijb_3W1zscYEgYW_dB70On6MjobQdnv3OK3u7vXot5slg-PBa3i6SiksREGSGU0MA5FTzNc7mWOlfUVEYqrplaAdGSQw5KDqU1VTrVsMpEDoZrnUo2RZfj3Tb4oVEXy43vgxteloxkeao4ydOBuhqpKviuC2DKNthPHXYlJeVeXrmXV47yBvpipC0A_CMzJijn7AdrO2v9</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Qin, Liang</creator><creator>Gu, Huaxi</creator><creator>Yu, Xiaoshan</creator><creator>Cai, Zheyi</creator><creator>Liu, Junchen</creator><general>Optica Publishing Group</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240601</creationdate><title>Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration</title><author>Qin, Liang ; Gu, Huaxi ; Yu, Xiaoshan ; Cai, Zheyi ; Liu, Junchen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Bandwidths</topic><topic>Computation</topic><topic>Configuration management</topic><topic>Correlation</topic><topic>Delays</topic><topic>Multiprocessor interconnection</topic><topic>Network topologies</topic><topic>Network topology</topic><topic>Optical switches</topic><topic>Reconfiguration</topic><topic>Routing</topic><topic>Topology</topic><topic>Traffic congestion</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qin, Liang</creatorcontrib><creatorcontrib>Gu, Huaxi</creatorcontrib><creatorcontrib>Yu, Xiaoshan</creatorcontrib><creatorcontrib>Cai, Zheyi</creatorcontrib><creatorcontrib>Liu, Junchen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of optical communications and networking</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qin, Liang</au><au>Gu, Huaxi</au><au>Yu, Xiaoshan</au><au>Cai, Zheyi</au><au>Liu, Junchen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration</atitle><jtitle>Journal of optical communications and networking</jtitle><stitle>jocn</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>16</volume><issue>6</issue><spage>644</spage><epage>658</epage><pages>644-658</pages><issn>1943-0620</issn><eissn>1943-0639</eissn><coden>JOCNBB</coden><abstract>Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least {3} \times and enhances throughput by up to 60%.</abstract><cop>Piscataway</cop><pub>Optica Publishing Group</pub><doi>10.1364/JOCN.516031</doi><tpages>15</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1943-0620
ispartof Journal of optical communications and networking, 2024-06, Vol.16 (6), p.644-658
issn 1943-0620
1943-0639
language eng
recordid cdi_crossref_primary_10_1364_JOCN_516031
source IEEE Electronic Library (IEL) Journals; Jisc-Optica Publishing Group Read & Publish Agreement 2022-2024 – E Combination 1
subjects Bandwidths
Computation
Configuration management
Correlation
Delays
Multiprocessor interconnection
Network topologies
Network topology
Optical switches
Reconfiguration
Routing
Topology
Traffic congestion
title Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T21%3A25%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Orchid:%20enhancing%20HPC%20interconnection%20networks%20through%20infrequent%20topology%20reconfiguration&rft.jtitle=Journal%20of%20optical%20communications%20and%20networking&rft.au=Qin,%20Liang&rft.date=2024-06-01&rft.volume=16&rft.issue=6&rft.spage=644&rft.epage=658&rft.pages=644-658&rft.issn=1943-0620&rft.eissn=1943-0639&rft.coden=JOCNBB&rft_id=info:doi/10.1364/JOCN.516031&rft_dat=%3Cproquest_cross%3E3058294082%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3058294082&rft_id=info:pmid/&rft_ieee_id=10536144&rfr_iscdi=true