Loading…

Scale-out Systolic Arrays

Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference accelerators. Despite their potential, designing multi-pod systolic arrays to maximize effective throughput/Watt—i.e., throughput/Watt adjusted when accounting for array utilization—poses a unique set of challenges...

Full description

Saved in:

Bibliographic Details
Published in:	ACM transactions on architecture and code optimization 2023-03, Vol.20 (2), p.1-25, Article 27
Main Authors:	Yüzügüler, Ahmet Caner, Sönmez, Canberk, Drumond, Mario, Oh, Yunho, Falsafi, Babak, Frossard, Pascal
Format:	Article
Language:	English
Subjects:	Computer systems organization Systolic arrays
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Multi-pod systolic arrays are emerging as the architecture of choice in DNN inference accelerators. Despite their potential, designing multi-pod systolic arrays to maximize effective throughput/Watt—i.e., throughput/Watt adjusted when accounting for array utilization—poses a unique set of challenges. In this work, we study three key pillars in multi-pod systolic array designs, namely array granularity, interconnect, and tiling. We identify optimal array granularity across workloads and show that state-of-the-art commercial accelerators use suboptimal array sizes for single-tenancy workloads. We, then evaluate the bandwidth/latency trade-offs in interconnects and show that Butterfly networks offer a scalable topology for accelerators with a large number of pods. Finally, we introduce a novel data tiling scheme with custom partition size to maximize utilization in optimally sized pods. We propose Scale-out Systolic Arrays, a multi-pod inference accelerator for both single- and multi-tenancy based on these three pillars. We show that SOSA exhibits scaling of up to 600 TeraOps/s in effective throughput for state-of-the-art DNN inference workloads, and outperforms state-of-the-art multi-pod accelerators by a factor of 1.5 ×.1
ISSN:	1544-3566 1544-3973
DOI:	10.1145/3572917