Loading…

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PEs), with very efficient local data movement, well suited to a...

Full description

Saved in:
Bibliographic Details
Published in:IEEE computer architecture letters 2020-01, Vol.19 (1), p.34-37
Main Authors: Liu, Zhi-Gang, Whatmough, Paul N., Mattina, Matthew
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements (PEs), with very efficient local data movement, well suited to accelerating GEMM, and widely deployed in industry. In this letter, we describe two significant improvements to the traditional SA architecture, to specifically optimize for CNN inference. First, we generalize the traditional scalar PE, into a Tensor-PE , which gives rise to a family of new Systolic Tensor Array (STA) microarchitectures. The STA family increases intra-PE operand reuse and datapath efficiency, resulting in circuit area and power dissipation reduction of as much as 2.08× and 1.36× respectively, compared to the conventional SA at iso-throughput with INT8 operands. Second, we extend this design to support a novel block-sparse data format called density-bound block (DBB). This variant (STA-DBB) achieves a 3.14× and 1.97× improvement over the SA baseline at iso-throughput in area and power respectively, when processing specially-trained DBB-sparse models, while remaining fully backwards compatible with dense models.
ISSN:1556-6056
1556-6064
DOI:10.1109/LCA.2020.2979965