Loading…

Efficient Processing of MLPerf Mobile Workloads Using Digital Compute-In-Memory Macros

Compute-In-Memory (CIM) has recently emerged as a promising design paradigm to accelerate Deep Neural Network (DNN) processing. Continuously better energy-and area-efficiency at the macro level had been reported through many testchips over the last few years. However, in those macro design-oriented...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on computer-aided design of integrated circuits and systems 2024-04, Vol.43 (4), p.1-1
Main Authors:	Sun, Xiaoyu, Cao, Weidong, Crafton, Brian, Akarvardar, Kerem, Mori, Haruki, Fujiwara, Hidehiro, Noguchi, Hiroki, Chih, Yu-Der, Chang, Meng-Fan, Wang, Yih, Chang, Tsung-Yung Jonathan
Format:	Article
Language:	English
Subjects:	Adders Artificial neural networks Common Information Model (computing) compute-in-memory deep learning Efficiency Energy efficiency Microprocessors MLPerf benchmark Mobile computing Random access memory System-on-chip Throughput Workload Workloads
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Compute-In-Memory (CIM) has recently emerged as a promising design paradigm to accelerate Deep Neural Network (DNN) processing. Continuously better energy-and area-efficiency at the macro level had been reported through many testchips over the last few years. However, in those macro design-oriented studies, accelerator-level considerations such as memory accesses and processing of entire DNN workloads have not been investigated in-depth. In this paper, we aim to fill this gap starting with the characteristics of our latest CIM macro fabricated with cutting-edge FinFET CMOS technology at 4 nm node. We then study, through an accelerator simulator developed in-house, three key items that would determine the efficiency of our CIM macro in the accelerator context while running MLPerf Mobile suite: 1) dataflow optimization, 2) optimal selection of CIM macro dimensions to further improve macro utilization, and 3) optimal combination of multiple CIM macros. Although there is typically a stark contrast between macro-level peak and accelerator-level average throughput and energy efficiency, the aforementioned optimizations are shown to improve the macro utilization by 3.04× and reduce the Energy-Delay Product (EDP) to 0.34× compared to the original macro on MLPerf Mobile inference workloads. While we exploit a digital CIM macro in this study, the findings and proposed methods remain valid for other types of CIM (such as analog CIM and analog-digital-hybrid CIM) as well.
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2023.3333290