Loading…
Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration
Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Ov...
Saved in:
Published in: | Journal of optical communications and networking 2024-06, Vol.16 (6), p.644-658 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | cdi_FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273 |
container_end_page | 658 |
container_issue | 6 |
container_start_page | 644 |
container_title | Journal of optical communications and networking |
container_volume | 16 |
creator | Qin, Liang Gu, Huaxi Yu, Xiaoshan Cai, Zheyi Liu, Junchen |
description | Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least {3} \times and enhances throughput by up to 60%. |
doi_str_mv | 10.1364/JOCN.516031 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1364_JOCN_516031</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10536144</ieee_id><sourcerecordid>3058294082</sourcerecordid><originalsourceid>FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273</originalsourceid><addsrcrecordid>eNpNkDFPwzAQRi0EEqUwsTJYYkQpduzYMRuKgIIqygALS-Sm58Sl2MFxhPrvSRWEmO6G9919egidUzKjTPDrp2XxPMuoIIweoAlVnCVEMHX4t6fkGJ103YYQISnNJuh9GarGrm8wuEa7yroaz18KbF2EUHnnoIrWO-wgfvvw0eHYBN_XzQCYAF89uIijb_3W1zscYEgYW_dB70On6MjobQdnv3OK3u7vXot5slg-PBa3i6SiksREGSGU0MA5FTzNc7mWOlfUVEYqrplaAdGSQw5KDqU1VTrVsMpEDoZrnUo2RZfj3Tb4oVEXy43vgxteloxkeao4ydOBuhqpKviuC2DKNthPHXYlJeVeXrmXV47yBvpipC0A_CMzJijn7AdrO2v9</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3058294082</pqid></control><display><type>article</type><title>Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration</title><source>IEEE Electronic Library (IEL) Journals</source><source>Jisc-Optica Publishing Group Read & Publish Agreement 2022-2024 – E Combination 1</source><creator>Qin, Liang ; Gu, Huaxi ; Yu, Xiaoshan ; Cai, Zheyi ; Liu, Junchen</creator><creatorcontrib>Qin, Liang ; Gu, Huaxi ; Yu, Xiaoshan ; Cai, Zheyi ; Liu, Junchen</creatorcontrib><description>Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least {3} \times and enhances throughput by up to 60%.</description><identifier>ISSN: 1943-0620</identifier><identifier>EISSN: 1943-0639</identifier><identifier>DOI: 10.1364/JOCN.516031</identifier><identifier>CODEN: JOCNBB</identifier><language>eng</language><publisher>Piscataway: Optica Publishing Group</publisher><subject>Bandwidths ; Computation ; Configuration management ; Correlation ; Delays ; Multiprocessor interconnection ; Network topologies ; Network topology ; Optical switches ; Reconfiguration ; Routing ; Topology ; Traffic congestion</subject><ispartof>Journal of optical communications and networking, 2024-06, Vol.16 (6), p.644-658</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10536144$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Qin, Liang</creatorcontrib><creatorcontrib>Gu, Huaxi</creatorcontrib><creatorcontrib>Yu, Xiaoshan</creatorcontrib><creatorcontrib>Cai, Zheyi</creatorcontrib><creatorcontrib>Liu, Junchen</creatorcontrib><title>Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration</title><title>Journal of optical communications and networking</title><addtitle>jocn</addtitle><description>Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least {3} \times and enhances throughput by up to 60%.</description><subject>Bandwidths</subject><subject>Computation</subject><subject>Configuration management</subject><subject>Correlation</subject><subject>Delays</subject><subject>Multiprocessor interconnection</subject><subject>Network topologies</subject><subject>Network topology</subject><subject>Optical switches</subject><subject>Reconfiguration</subject><subject>Routing</subject><subject>Topology</subject><subject>Traffic congestion</subject><issn>1943-0620</issn><issn>1943-0639</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkDFPwzAQRi0EEqUwsTJYYkQpduzYMRuKgIIqygALS-Sm58Sl2MFxhPrvSRWEmO6G9919egidUzKjTPDrp2XxPMuoIIweoAlVnCVEMHX4t6fkGJ103YYQISnNJuh9GarGrm8wuEa7yroaz18KbF2EUHnnoIrWO-wgfvvw0eHYBN_XzQCYAF89uIijb_3W1zscYEgYW_dB70On6MjobQdnv3OK3u7vXot5slg-PBa3i6SiksREGSGU0MA5FTzNc7mWOlfUVEYqrplaAdGSQw5KDqU1VTrVsMpEDoZrnUo2RZfj3Tb4oVEXy43vgxteloxkeao4ydOBuhqpKviuC2DKNthPHXYlJeVeXrmXV47yBvpipC0A_CMzJijn7AdrO2v9</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Qin, Liang</creator><creator>Gu, Huaxi</creator><creator>Yu, Xiaoshan</creator><creator>Cai, Zheyi</creator><creator>Liu, Junchen</creator><general>Optica Publishing Group</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20240601</creationdate><title>Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration</title><author>Qin, Liang ; Gu, Huaxi ; Yu, Xiaoshan ; Cai, Zheyi ; Liu, Junchen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Bandwidths</topic><topic>Computation</topic><topic>Configuration management</topic><topic>Correlation</topic><topic>Delays</topic><topic>Multiprocessor interconnection</topic><topic>Network topologies</topic><topic>Network topology</topic><topic>Optical switches</topic><topic>Reconfiguration</topic><topic>Routing</topic><topic>Topology</topic><topic>Traffic congestion</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qin, Liang</creatorcontrib><creatorcontrib>Gu, Huaxi</creatorcontrib><creatorcontrib>Yu, Xiaoshan</creatorcontrib><creatorcontrib>Cai, Zheyi</creatorcontrib><creatorcontrib>Liu, Junchen</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998–Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of optical communications and networking</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qin, Liang</au><au>Gu, Huaxi</au><au>Yu, Xiaoshan</au><au>Cai, Zheyi</au><au>Liu, Junchen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration</atitle><jtitle>Journal of optical communications and networking</jtitle><stitle>jocn</stitle><date>2024-06-01</date><risdate>2024</risdate><volume>16</volume><issue>6</issue><spage>644</spage><epage>658</epage><pages>644-658</pages><issn>1943-0620</issn><eissn>1943-0639</eissn><coden>JOCNBB</coden><abstract>Interconnection networks are key components of high-performance computing (HPC) systems. As HPC evolves towards the exascale era, providing sufficient bisection bandwidth between computing node pairs through oversubscription in traditional networks becomes prohibitively expensive and impractical. Over the past decade, several architectures leveraging optical circuit switches (OCSs) for dynamic link bandwidth allocation have gained traction. These architectures require frequent network topology reconfiguration to adapt to changing traffic demands. However, practical implementation remains hampered by the long reconfiguration delays inherent in OCS technology. We propose Orchid, an architecture that leverages OCSs to achieve infrequent topology reconfigurations, effectively addressing the problem of long reconfiguration delays. A key innovation of Orchid is its ability to extract stable traffic matrices from historical data. This functionality guides the reconfiguration of the topology without the need for adjustments with each traffic matrix, thereby enabling the sharing of OCS overhead over an extended timeframe. Furthermore, Orchid addresses potential congestion arising from unexpected traffic through the joint design of OCS configuration and routing, ensuring an even distribution of traffic across global links. Extensive experiments using real HPC application traces and synthetic traffic demonstrate that Orchid achieves significant performance improvements compared to existing HPC interconnection networks. Specifically, Orchid reduces packet delay by at least {3} \times and enhances throughput by up to 60%.</abstract><cop>Piscataway</cop><pub>Optica Publishing Group</pub><doi>10.1364/JOCN.516031</doi><tpages>15</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1943-0620 |
ispartof | Journal of optical communications and networking, 2024-06, Vol.16 (6), p.644-658 |
issn | 1943-0620 1943-0639 |
language | eng |
recordid | cdi_crossref_primary_10_1364_JOCN_516031 |
source | IEEE Electronic Library (IEL) Journals; Jisc-Optica Publishing Group Read & Publish Agreement 2022-2024 – E Combination 1 |
subjects | Bandwidths Computation Configuration management Correlation Delays Multiprocessor interconnection Network topologies Network topology Optical switches Reconfiguration Routing Topology Traffic congestion |
title | Orchid: enhancing HPC interconnection networks through infrequent topology reconfiguration |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-12T21%3A25%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Orchid:%20enhancing%20HPC%20interconnection%20networks%20through%20infrequent%20topology%20reconfiguration&rft.jtitle=Journal%20of%20optical%20communications%20and%20networking&rft.au=Qin,%20Liang&rft.date=2024-06-01&rft.volume=16&rft.issue=6&rft.spage=644&rft.epage=658&rft.pages=644-658&rft.issn=1943-0620&rft.eissn=1943-0639&rft.coden=JOCNBB&rft_id=info:doi/10.1364/JOCN.516031&rft_dat=%3Cproquest_cross%3E3058294082%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c170t-9f6696ae441642887d7a891fcf794a39be0a74e8e97067a19a2aeb568ef4aa273%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3058294082&rft_id=info:pmid/&rft_ieee_id=10536144&rfr_iscdi=true |