Loading…

SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions

Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems for video technology 2024-09, Vol.34 (9), p.8912-8923
Main Authors: Fu, Yuzhe, Zhou, Changchun, Huang, Tianling, Han, Eryi, He, Yifan, Jiao, Hailong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c134t-a88cabd2ca04858e403be57879dbf802d3fcfe1b0df55d0865565c5c05f7dc023
container_end_page 8923
container_issue 9
container_start_page 8912
container_title IEEE transactions on circuits and systems for video technology
container_volume 34
creator Fu, Yuzhe
Zhou, Changchun
Huang, Tianling
Han, Eryi
He, Yifan
Jiao, Hailong
description Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions, such as softmax and complex activation functions, which require high precision hardware design yet typically with significant cost in area and power consumption. To address these challenges, SoftAct, a compact and high-precision algorithm-hardware co-designed architecture, is proposed to implement both softmax and nonlinear activation functions in CAC Transformer accelerators. An improved softmax algorithm with penalties is proposed to maintain precision in hardware. A stage-wise full zero detection method is developed to skip redundant computation in softmax. A compact and reconfigurable architecture with a symmetrically designed linear fitting module is proposed to achieve nonlinear functions. The SoftAct architecture is designed in an industrial 28-nm CMOS technology with the MobileViT-xxs network classifying the ImageNet-1k dataset as the benchmark. Compared with the state of the art, SoftAct improves up to 5.87% network accuracy under 8-bit quantization, 153.2\times area efficiency, and 1435\times overall efficiency.
doi_str_mv 10.1109/TCSVT.2024.3386779
format article
fullrecord <record><control><sourceid>crossref_ieee_</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TCSVT_2024_3386779</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10495359</ieee_id><sourcerecordid>10_1109_TCSVT_2024_3386779</sourcerecordid><originalsourceid>FETCH-LOGICAL-c134t-a88cabd2ca04858e403be57879dbf802d3fcfe1b0df55d0865565c5c05f7dc023</originalsourceid><addsrcrecordid>eNpNkNFKwzAYhYMoOKcvIF7kBTr_JP3X1LsynBOGCq1eWtI02SJbO5IM9O3t3C68OgcO37n4CLllMGEM8vtqVn5UEw48nQghp1mWn5ERQ5QJ54DnQwdkieQML8lVCF8ALJVpNiKfZW9joeMDLejCrdbJmzfaBdd39LBs1TctvF67aHTce0Nt72nlVReGsjU-0HK_2_U-um5FX_pu4zqjPJ3vOx2Hj3BNLqzaBHNzyjF5nz9Ws0WyfH16nhXLRDORxkRJqVXTcq0glShNCqIxmMksbxsrgbfCamtYA61FbEFOEaeoUQParNXAxZjw46_2fQje2Hrn3Vb5n5pBfTBU_xmqD4bqk6EBujtCzhjzD0hzFJiLXyDcZNw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Fu, Yuzhe ; Zhou, Changchun ; Huang, Tianling ; Han, Eryi ; He, Yifan ; Jiao, Hailong</creator><creatorcontrib>Fu, Yuzhe ; Zhou, Changchun ; Huang, Tianling ; Han, Eryi ; He, Yifan ; Jiao, Hailong</creatorcontrib><description><![CDATA[Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions, such as softmax and complex activation functions, which require high precision hardware design yet typically with significant cost in area and power consumption. To address these challenges, SoftAct, a compact and high-precision algorithm-hardware co-designed architecture, is proposed to implement both softmax and nonlinear activation functions in CAC Transformer accelerators. An improved softmax algorithm with penalties is proposed to maintain precision in hardware. A stage-wise full zero detection method is developed to skip redundant computation in softmax. A compact and reconfigurable architecture with a symmetrically designed linear fitting module is proposed to achieve nonlinear functions. The SoftAct architecture is designed in an industrial 28-nm CMOS technology with the MobileViT-xxs network classifying the ImageNet-1k dataset as the benchmark. Compared with the state of the art, SoftAct improves up to 5.87% network accuracy under 8-bit quantization, <inline-formula> <tex-math notation="LaTeX">153.2\times </tex-math></inline-formula> area efficiency, and <inline-formula> <tex-math notation="LaTeX">1435\times </tex-math></inline-formula> overall efficiency.]]></description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3386779</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computer architecture ; Convolutional neural networks ; Costs ; Deep learning ; Detection algorithms ; Inference algorithms ; nonlinear functions ; Nonlinear systems ; overall efficiency ; Quantization (signal) ; softmax ; Software architecture ; sparsity detection ; Transformer-based networks ; Transformers</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-09, Vol.34 (9), p.8912-8923</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c134t-a88cabd2ca04858e403be57879dbf802d3fcfe1b0df55d0865565c5c05f7dc023</cites><orcidid>0000-0002-1381-0278 ; 0009-0005-3968-5048 ; 0009-0005-9952-5054 ; 0009-0001-8664-037X ; 0000-0002-2815-6168</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10495359$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,54771</link.rule.ids></links><search><creatorcontrib>Fu, Yuzhe</creatorcontrib><creatorcontrib>Zhou, Changchun</creatorcontrib><creatorcontrib>Huang, Tianling</creatorcontrib><creatorcontrib>Han, Eryi</creatorcontrib><creatorcontrib>He, Yifan</creatorcontrib><creatorcontrib>Jiao, Hailong</creatorcontrib><title>SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description><![CDATA[Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions, such as softmax and complex activation functions, which require high precision hardware design yet typically with significant cost in area and power consumption. To address these challenges, SoftAct, a compact and high-precision algorithm-hardware co-designed architecture, is proposed to implement both softmax and nonlinear activation functions in CAC Transformer accelerators. An improved softmax algorithm with penalties is proposed to maintain precision in hardware. A stage-wise full zero detection method is developed to skip redundant computation in softmax. A compact and reconfigurable architecture with a symmetrically designed linear fitting module is proposed to achieve nonlinear functions. The SoftAct architecture is designed in an industrial 28-nm CMOS technology with the MobileViT-xxs network classifying the ImageNet-1k dataset as the benchmark. Compared with the state of the art, SoftAct improves up to 5.87% network accuracy under 8-bit quantization, <inline-formula> <tex-math notation="LaTeX">153.2\times </tex-math></inline-formula> area efficiency, and <inline-formula> <tex-math notation="LaTeX">1435\times </tex-math></inline-formula> overall efficiency.]]></description><subject>Computer architecture</subject><subject>Convolutional neural networks</subject><subject>Costs</subject><subject>Deep learning</subject><subject>Detection algorithms</subject><subject>Inference algorithms</subject><subject>nonlinear functions</subject><subject>Nonlinear systems</subject><subject>overall efficiency</subject><subject>Quantization (signal)</subject><subject>softmax</subject><subject>Software architecture</subject><subject>sparsity detection</subject><subject>Transformer-based networks</subject><subject>Transformers</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkNFKwzAYhYMoOKcvIF7kBTr_JP3X1LsynBOGCq1eWtI02SJbO5IM9O3t3C68OgcO37n4CLllMGEM8vtqVn5UEw48nQghp1mWn5ERQ5QJ54DnQwdkieQML8lVCF8ALJVpNiKfZW9joeMDLejCrdbJmzfaBdd39LBs1TctvF67aHTce0Nt72nlVReGsjU-0HK_2_U-um5FX_pu4zqjPJ3vOx2Hj3BNLqzaBHNzyjF5nz9Ws0WyfH16nhXLRDORxkRJqVXTcq0glShNCqIxmMksbxsrgbfCamtYA61FbEFOEaeoUQParNXAxZjw46_2fQje2Hrn3Vb5n5pBfTBU_xmqD4bqk6EBujtCzhjzD0hzFJiLXyDcZNw</recordid><startdate>202409</startdate><enddate>202409</enddate><creator>Fu, Yuzhe</creator><creator>Zhou, Changchun</creator><creator>Huang, Tianling</creator><creator>Han, Eryi</creator><creator>He, Yifan</creator><creator>Jiao, Hailong</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-1381-0278</orcidid><orcidid>https://orcid.org/0009-0005-3968-5048</orcidid><orcidid>https://orcid.org/0009-0005-9952-5054</orcidid><orcidid>https://orcid.org/0009-0001-8664-037X</orcidid><orcidid>https://orcid.org/0000-0002-2815-6168</orcidid></search><sort><creationdate>202409</creationdate><title>SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions</title><author>Fu, Yuzhe ; Zhou, Changchun ; Huang, Tianling ; Han, Eryi ; He, Yifan ; Jiao, Hailong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c134t-a88cabd2ca04858e403be57879dbf802d3fcfe1b0df55d0865565c5c05f7dc023</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer architecture</topic><topic>Convolutional neural networks</topic><topic>Costs</topic><topic>Deep learning</topic><topic>Detection algorithms</topic><topic>Inference algorithms</topic><topic>nonlinear functions</topic><topic>Nonlinear systems</topic><topic>overall efficiency</topic><topic>Quantization (signal)</topic><topic>softmax</topic><topic>Software architecture</topic><topic>sparsity detection</topic><topic>Transformer-based networks</topic><topic>Transformers</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fu, Yuzhe</creatorcontrib><creatorcontrib>Zhou, Changchun</creatorcontrib><creatorcontrib>Huang, Tianling</creatorcontrib><creatorcontrib>Han, Eryi</creatorcontrib><creatorcontrib>He, Yifan</creatorcontrib><creatorcontrib>Jiao, Hailong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Fu, Yuzhe</au><au>Zhou, Changchun</au><au>Huang, Tianling</au><au>Han, Eryi</au><au>He, Yifan</au><au>Jiao, Hailong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-09</date><risdate>2024</risdate><volume>34</volume><issue>9</issue><spage>8912</spage><epage>8923</epage><pages>8912-8923</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract><![CDATA[Transformer-based deep learning networks are revolutionizing our society. The convolution and attention co-designed (CAC) Transformers have demonstrated superior performance compared to the conventional Transformer-based networks. However, CAC Transformer networks contain various nonlinear functions, such as softmax and complex activation functions, which require high precision hardware design yet typically with significant cost in area and power consumption. To address these challenges, SoftAct, a compact and high-precision algorithm-hardware co-designed architecture, is proposed to implement both softmax and nonlinear activation functions in CAC Transformer accelerators. An improved softmax algorithm with penalties is proposed to maintain precision in hardware. A stage-wise full zero detection method is developed to skip redundant computation in softmax. A compact and reconfigurable architecture with a symmetrically designed linear fitting module is proposed to achieve nonlinear functions. The SoftAct architecture is designed in an industrial 28-nm CMOS technology with the MobileViT-xxs network classifying the ImageNet-1k dataset as the benchmark. Compared with the state of the art, SoftAct improves up to 5.87% network accuracy under 8-bit quantization, <inline-formula> <tex-math notation="LaTeX">153.2\times </tex-math></inline-formula> area efficiency, and <inline-formula> <tex-math notation="LaTeX">1435\times </tex-math></inline-formula> overall efficiency.]]></abstract><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3386779</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-1381-0278</orcidid><orcidid>https://orcid.org/0009-0005-3968-5048</orcidid><orcidid>https://orcid.org/0009-0005-9952-5054</orcidid><orcidid>https://orcid.org/0009-0001-8664-037X</orcidid><orcidid>https://orcid.org/0000-0002-2815-6168</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2024-09, Vol.34 (9), p.8912-8923
issn 1051-8215
1558-2205
language eng
recordid cdi_crossref_primary_10_1109_TCSVT_2024_3386779
source IEEE Electronic Library (IEL) Journals
subjects Computer architecture
Convolutional neural networks
Costs
Deep learning
Detection algorithms
Inference algorithms
nonlinear functions
Nonlinear systems
overall efficiency
Quantization (signal)
softmax
Software architecture
sparsity detection
Transformer-based networks
Transformers
title SoftAct: A High-Precision Softmax Architecture for Transformers Supporting Nonlinear Functions
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T07%3A56%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SoftAct:%20A%20High-Precision%20Softmax%20Architecture%20for%20Transformers%20Supporting%20Nonlinear%20Functions&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Fu,%20Yuzhe&rft.date=2024-09&rft.volume=34&rft.issue=9&rft.spage=8912&rft.epage=8923&rft.pages=8912-8923&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3386779&rft_dat=%3Ccrossref_ieee_%3E10_1109_TCSVT_2024_3386779%3C/crossref_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c134t-a88cabd2ca04858e403be57879dbf802d3fcfe1b0df55d0865565c5c05f7dc023%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10495359&rfr_iscdi=true