Loading…

SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions

Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems for video technology 2024-09, Vol.34 (9), p.8912-8923
Main Authors:	Fu, Yuzhe, Zhou, Changchun, Huang, Tianling, Han, Eryi, He, Yifan, Jiao, Hailong
Format:	Article
Language:	English
Subjects:	Computer architecture Convolutional neural networks Costs Deep learning Detection algorithms Inference algorithms nonlinear functions Nonlinear systems overall efficiency Quantization (signal) softmax Software architecture sparsity detection Transformer-based networks Transformers
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions, such as softmax and complex activation functions, which require high precision hardware design yet typically with significant cost in area and power consumption. To address these challenges, SoftAct, a compact and high-precision algorithm-hardware co-designed architecture, is proposed to implement both softmax and nonlinear activation functions in CAC Transformer accelerators. An improved softmax algorithm with penalties is proposed to maintain precision in hardware. A stage-wise full zero detection method is developed to skip redundant computation in softmax. A compact and reconfigurable architecture with a symmetrically designed linear fitting module is proposed to achieve nonlinear functions. The SoftAct architecture is designed in an industrial 28-nm CMOS technology with the MobileViT-xxs network classifying the ImageNet-1k dataset as the benchmark. Compared with the state of the art, SoftAct improves up to 5.87% network accuracy under 8-bit quantization, 153.2\times area efficiency, and 1435\times overall efficiency.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3386779