Loading…
SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions
Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions...
Saved in:
Published in: | IEEE transactions on circuits and systems for video technology 2024-09, Vol.34 (9), p.8912-8923 |
---|---|
Main Authors: | , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions, such as softmax and complex activation functions, which require high precision hardware design yet typically with significant cost in area and power consumption. To address these challenges, SoftAct, a compact and high-precision algorithm-hardware co-designed architecture, is proposed to implement both softmax and nonlinear activation functions in CAC Transformer accelerators. An improved softmax algorithm with penalties is proposed to maintain precision in hardware. A stage-wise full zero detection method is developed to skip redundant computation in softmax. A compact and reconfigurable architecture with a symmetrically designed linear fitting module is proposed to achieve nonlinear functions. The SoftAct architecture is designed in an industrial 28-nm CMOS technology with the MobileViT-xxs network classifying the ImageNet-1k dataset as the benchmark. Compared with the state of the art, SoftAct improves up to 5.87% network accuracy under 8-bit quantization, 153.2\times area efficiency, and 1435\times overall efficiency. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2024.3386779 |