Loading…
A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS
High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems. Conventional routers use distinct logic blocks for routing data and handling arbitration. At higher radices, conne...
Saved in:
Main Authors: | , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 480 |
container_issue | |
container_start_page | 478 |
container_title | |
container_volume | |
creator | Satpathy, S. Sewell, K. Manville, T. Yen-Po Chen Dreslinski, R. Sylvester, D. Mudge, T. Blaauw, D. |
description | High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems. Conventional routers use distinct logic blocks for routing data and handling arbitration. At higher radices, connections between these blocks become a bottleneck, limiting router scalability and degrading performance. Recently, two switch topologies merged the data routing fabric with arbitration control, avoiding this bottleneck. However, relies on centralized control for channel allocation, limiting performance, while restricted to a small set of fixed priorities, rendering input ports prone to starvation. In addition, ever larger CMPs will require continued increases in bandwidth over previous designs. To address these issues, we present a 64x64 single-stage swizzle-switch network (SSN) with 128b data buses (8192 total input/output wires). The SSN can connect any input to any output, including multicast. It has a peak measured throughput of 4.5Tb/s at 1.1V in 45nm SOI CMOS at 25°C. The SSN's key features are: 1) a single-cycle least-recently granted (LRG) priority arbitration technique that reuses the already present input and output data buses and their drivers and sense amps; 2) an additional 4-level message-based priority arbitration for quality of service (QoS) with 2% logic and 3% wiring overhead; 3) a bidirectional bitline repeater that allows the router to scale to >;8000 wires. These features result in a compact fabric (4.06mm2) with throughput gain of 2.1 x over at 3.4Tb/s/W efficiency, which improves to 7.4Tb/s/W at 600mV. |
doi_str_mv | 10.1109/ISSCC.2012.6177098 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_6177098</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6177098</ieee_id><sourcerecordid>6177098</sourcerecordid><originalsourceid>FETCH-ieee_primary_61770983</originalsourceid><addsrcrecordid>eNp9j01OwzAYRM2fRAq9AGy-CzixY8dOliiigkXFIpVYVk7itEapU2wXlDNwAA7ExQhS2XY1M3rSkwahO0piSkmRPFdVWcYpoWksqJSkyM_QvJA55UIywqTk5yhKmRQ4F0RcoNk_EOwSRYQWDIuMkWs08_6NEJIVIo_Q1wPwOFvViQcW879MXkHwn2_BwX-a0GyhU7UzDUxjC173HT7sWxWM3UCvlQ_Y6Ubb0I9445QNuoW9M4MzYQRlW3g_qH7qeOiw1-7DNBqUq01wk2KwYCzwzO6gXL5Ut-iqU73X82PeoPvF46p8wkZrvZ6sO-XG9fE8O01_AeRhWHQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Satpathy, S. ; Sewell, K. ; Manville, T. ; Yen-Po Chen ; Dreslinski, R. ; Sylvester, D. ; Mudge, T. ; Blaauw, D.</creator><creatorcontrib>Satpathy, S. ; Sewell, K. ; Manville, T. ; Yen-Po Chen ; Dreslinski, R. ; Sylvester, D. ; Mudge, T. ; Blaauw, D.</creatorcontrib><description>High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems. Conventional routers use distinct logic blocks for routing data and handling arbitration. At higher radices, connections between these blocks become a bottleneck, limiting router scalability and degrading performance. Recently, two switch topologies merged the data routing fabric with arbitration control, avoiding this bottleneck. However, relies on centralized control for channel allocation, limiting performance, while restricted to a small set of fixed priorities, rendering input ports prone to starvation. In addition, ever larger CMPs will require continued increases in bandwidth over previous designs. To address these issues, we present a 64x64 single-stage swizzle-switch network (SSN) with 128b data buses (8192 total input/output wires). The SSN can connect any input to any output, including multicast. It has a peak measured throughput of 4.5Tb/s at 1.1V in 45nm SOI CMOS at 25°C. The SSN's key features are: 1) a single-cycle least-recently granted (LRG) priority arbitration technique that reuses the already present input and output data buses and their drivers and sense amps; 2) an additional 4-level message-based priority arbitration for quality of service (QoS) with 2% logic and 3% wiring overhead; 3) a bidirectional bitline repeater that allows the router to scale to >;8000 wires. These features result in a compact fabric (4.06mm2) with throughput gain of 2.1 x over at 3.4Tb/s/W efficiency, which improves to 7.4Tb/s/W at 600mV.</description><identifier>ISSN: 0193-6530</identifier><identifier>ISBN: 1467303763</identifier><identifier>ISBN: 9781467303767</identifier><identifier>EISSN: 2376-8606</identifier><identifier>EISBN: 9781467303774</identifier><identifier>EISBN: 1467303771</identifier><identifier>EISBN: 9781467303743</identifier><identifier>EISBN: 1467303747</identifier><identifier>EISBN: 9781467303750</identifier><identifier>EISBN: 1467303755</identifier><identifier>DOI: 10.1109/ISSCC.2012.6177098</identifier><language>eng</language><publisher>IEEE</publisher><subject>Delay ; Fabrics ; Quality of service ; Repeaters ; Routing ; Switches ; Very large scale integration</subject><ispartof>2012 IEEE International Solid-State Circuits Conference, 2012, p.478-480</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6177098$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54530,54895,54907</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6177098$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Satpathy, S.</creatorcontrib><creatorcontrib>Sewell, K.</creatorcontrib><creatorcontrib>Manville, T.</creatorcontrib><creatorcontrib>Yen-Po Chen</creatorcontrib><creatorcontrib>Dreslinski, R.</creatorcontrib><creatorcontrib>Sylvester, D.</creatorcontrib><creatorcontrib>Mudge, T.</creatorcontrib><creatorcontrib>Blaauw, D.</creatorcontrib><title>A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS</title><title>2012 IEEE International Solid-State Circuits Conference</title><addtitle>ISSCC</addtitle><description>High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems. Conventional routers use distinct logic blocks for routing data and handling arbitration. At higher radices, connections between these blocks become a bottleneck, limiting router scalability and degrading performance. Recently, two switch topologies merged the data routing fabric with arbitration control, avoiding this bottleneck. However, relies on centralized control for channel allocation, limiting performance, while restricted to a small set of fixed priorities, rendering input ports prone to starvation. In addition, ever larger CMPs will require continued increases in bandwidth over previous designs. To address these issues, we present a 64x64 single-stage swizzle-switch network (SSN) with 128b data buses (8192 total input/output wires). The SSN can connect any input to any output, including multicast. It has a peak measured throughput of 4.5Tb/s at 1.1V in 45nm SOI CMOS at 25°C. The SSN's key features are: 1) a single-cycle least-recently granted (LRG) priority arbitration technique that reuses the already present input and output data buses and their drivers and sense amps; 2) an additional 4-level message-based priority arbitration for quality of service (QoS) with 2% logic and 3% wiring overhead; 3) a bidirectional bitline repeater that allows the router to scale to >;8000 wires. These features result in a compact fabric (4.06mm2) with throughput gain of 2.1 x over at 3.4Tb/s/W efficiency, which improves to 7.4Tb/s/W at 600mV.</description><subject>Delay</subject><subject>Fabrics</subject><subject>Quality of service</subject><subject>Repeaters</subject><subject>Routing</subject><subject>Switches</subject><subject>Very large scale integration</subject><issn>0193-6530</issn><issn>2376-8606</issn><isbn>1467303763</isbn><isbn>9781467303767</isbn><isbn>9781467303774</isbn><isbn>1467303771</isbn><isbn>9781467303743</isbn><isbn>1467303747</isbn><isbn>9781467303750</isbn><isbn>1467303755</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2012</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNp9j01OwzAYRM2fRAq9AGy-CzixY8dOliiigkXFIpVYVk7itEapU2wXlDNwAA7ExQhS2XY1M3rSkwahO0piSkmRPFdVWcYpoWksqJSkyM_QvJA55UIywqTk5yhKmRQ4F0RcoNk_EOwSRYQWDIuMkWs08_6NEJIVIo_Q1wPwOFvViQcW879MXkHwn2_BwX-a0GyhU7UzDUxjC173HT7sWxWM3UCvlQ_Y6Ubb0I9445QNuoW9M4MzYQRlW3g_qH7qeOiw1-7DNBqUq01wk2KwYCzwzO6gXL5Ut-iqU73X82PeoPvF46p8wkZrvZ6sO-XG9fE8O01_AeRhWHQ</recordid><startdate>201202</startdate><enddate>201202</enddate><creator>Satpathy, S.</creator><creator>Sewell, K.</creator><creator>Manville, T.</creator><creator>Yen-Po Chen</creator><creator>Dreslinski, R.</creator><creator>Sylvester, D.</creator><creator>Mudge, T.</creator><creator>Blaauw, D.</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>201202</creationdate><title>A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS</title><author>Satpathy, S. ; Sewell, K. ; Manville, T. ; Yen-Po Chen ; Dreslinski, R. ; Sylvester, D. ; Mudge, T. ; Blaauw, D.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-ieee_primary_61770983</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Delay</topic><topic>Fabrics</topic><topic>Quality of service</topic><topic>Repeaters</topic><topic>Routing</topic><topic>Switches</topic><topic>Very large scale integration</topic><toplevel>online_resources</toplevel><creatorcontrib>Satpathy, S.</creatorcontrib><creatorcontrib>Sewell, K.</creatorcontrib><creatorcontrib>Manville, T.</creatorcontrib><creatorcontrib>Yen-Po Chen</creatorcontrib><creatorcontrib>Dreslinski, R.</creatorcontrib><creatorcontrib>Sylvester, D.</creatorcontrib><creatorcontrib>Mudge, T.</creatorcontrib><creatorcontrib>Blaauw, D.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Satpathy, S.</au><au>Sewell, K.</au><au>Manville, T.</au><au>Yen-Po Chen</au><au>Dreslinski, R.</au><au>Sylvester, D.</au><au>Mudge, T.</au><au>Blaauw, D.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS</atitle><btitle>2012 IEEE International Solid-State Circuits Conference</btitle><stitle>ISSCC</stitle><date>2012-02</date><risdate>2012</risdate><spage>478</spage><epage>480</epage><pages>478-480</pages><issn>0193-6530</issn><eissn>2376-8606</eissn><isbn>1467303763</isbn><isbn>9781467303767</isbn><eisbn>9781467303774</eisbn><eisbn>1467303771</eisbn><eisbn>9781467303743</eisbn><eisbn>1467303747</eisbn><eisbn>9781467303750</eisbn><eisbn>1467303755</eisbn><abstract>High-speed and low-power routers form the basic building blocks of on-die interconnect fabrics that are critical to overall throughput and energy efficiency of high performance systems. Conventional routers use distinct logic blocks for routing data and handling arbitration. At higher radices, connections between these blocks become a bottleneck, limiting router scalability and degrading performance. Recently, two switch topologies merged the data routing fabric with arbitration control, avoiding this bottleneck. However, relies on centralized control for channel allocation, limiting performance, while restricted to a small set of fixed priorities, rendering input ports prone to starvation. In addition, ever larger CMPs will require continued increases in bandwidth over previous designs. To address these issues, we present a 64x64 single-stage swizzle-switch network (SSN) with 128b data buses (8192 total input/output wires). The SSN can connect any input to any output, including multicast. It has a peak measured throughput of 4.5Tb/s at 1.1V in 45nm SOI CMOS at 25°C. The SSN's key features are: 1) a single-cycle least-recently granted (LRG) priority arbitration technique that reuses the already present input and output data buses and their drivers and sense amps; 2) an additional 4-level message-based priority arbitration for quality of service (QoS) with 2% logic and 3% wiring overhead; 3) a bidirectional bitline repeater that allows the router to scale to >;8000 wires. These features result in a compact fabric (4.06mm2) with throughput gain of 2.1 x over at 3.4Tb/s/W efficiency, which improves to 7.4Tb/s/W at 600mV.</abstract><pub>IEEE</pub><doi>10.1109/ISSCC.2012.6177098</doi></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0193-6530 |
ispartof | 2012 IEEE International Solid-State Circuits Conference, 2012, p.478-480 |
issn | 0193-6530 2376-8606 |
language | eng |
recordid | cdi_ieee_primary_6177098 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Delay Fabrics Quality of service Repeaters Routing Switches Very large scale integration |
title | A 4.5Tb/s 3.4Tb/s/W 64×64 switch fabric with self-updating least-recently-granted priority and quality-of-service arbitration in 45nm CMOS |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T01%3A12%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=A%204.5Tb/s%203.4Tb/s/W%2064%C3%9764%20switch%20fabric%20with%20self-updating%20least-recently-granted%20priority%20and%20quality-of-service%20arbitration%20in%2045nm%20CMOS&rft.btitle=2012%20IEEE%20International%20Solid-State%20Circuits%20Conference&rft.au=Satpathy,%20S.&rft.date=2012-02&rft.spage=478&rft.epage=480&rft.pages=478-480&rft.issn=0193-6530&rft.eissn=2376-8606&rft.isbn=1467303763&rft.isbn_list=9781467303767&rft_id=info:doi/10.1109/ISSCC.2012.6177098&rft.eisbn=9781467303774&rft.eisbn_list=1467303771&rft.eisbn_list=9781467303743&rft.eisbn_list=1467303747&rft.eisbn_list=9781467303750&rft.eisbn_list=1467303755&rft_dat=%3Cieee_6IE%3E6177098%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-ieee_primary_61770983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=6177098&rfr_iscdi=true |