Loading…

3.53-TOPS/W EEAIP: An Energy-Efficient Artificial Intelligence Hardware Architecture for Edge AI Applications

Artificial intelligence in the Internet of Things (AIoT) is a promising technology for consumer electronics. Battery life and package size are essential constraints for AI applications on edge devices. Thus, an efficient hardware architecture is important to support deep neural network (DNN) AI algo...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on consumer electronics 2024-02, Vol.70 (1), p.4333-4344
Main Authors: Chen, Wan-Yu, Chen, Liang-Gee
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Artificial intelligence in the Internet of Things (AIoT) is a promising technology for consumer electronics. Battery life and package size are essential constraints for AI applications on edge devices. Thus, an efficient hardware architecture is important to support deep neural network (DNN) AI algorithms. The critical concerns are the high memory bandwidth and multichannel computation requirements of DNN processing. Conventional AI processors exploit complex memory pads, dedicated processing element (PE) buffers, and mass shift registers to support data reuse for memory bandwidth reduction. However, such architectures incur significant area overhead and power consumption. This paper proposes a novel channel-interleaved memory (CIM) footprint and dual-level memory pad (DLMP) control to enhance memory bandwidth utilization and simplify the memory pad circuit. Interleaved channel data are read from the memory bus with a single access and stored in a ping-pong buffer for reuse. Dynamic power is reduced by replacing the shift register PE mechanism with simplified mux selection. A joint stationary data reuse (JSDR) approach is adopted to process interleaved channel data efficiently. Finally, a hybrid memory buffer (HMB) reduces on-chip memory use through dynamic memory allocation. Experimental results demonstrate that the proposed architecture achieves a state-of-the-art area efficiency of 207.4 GOPS/mm2 while maintaining a high power efficiency of 3.53 TOPS/W.
ISSN:0098-3063
1558-4127
DOI:10.1109/TCE.2023.3323644