Loading…

PipeCIM: A High-Throughput Computing-In-Memory Microprocessor With Nested Pipeline and RISC-V Extended Instructions

The large number of multiply accumulate (MAC) operations in Convolutional Neural Network (CNN) leads to substantial data migration and computation. Although computing-in-memory (CIM) proves to be a promising paradigm for MAC operations, high throughput CNN accelerator still confronts bottlenecks fro...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2024-07, Vol.71 (7), p.3214-3227
Main Authors:	Chen, Tingran, Wang, Wenjia, Chen, Jiaqi, Fu, Haotian, Yi, Wente, Cheng, Bojun, Zhang, He, Pan, Biao
Format:	Article
Language:	English
Subjects:	Artificial neural networks Bandwidth Chips (memory devices) Circuits CNN accelerator Common Information Model (computing) Computer architecture Computer memory Computing-in-memory (CIM) Convolutional neural networks Data transfer (computers) Hierarchies multi-core architecture nested pipeline Pipelines Pipelining (computers) RISC RISC-V extended instructions Throughput
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The large number of multiply accumulate (MAC) operations in Convolutional Neural Network (CNN) leads to substantial data migration and computation. Although computing-in-memory (CIM) proves to be a promising paradigm for MAC operations, high throughput CNN accelerator still confronts bottlenecks from: the low MAC utilization and the uncessary off-chip memory access. In this paper, we propose a high throughput CIM-based CNN accelerator PipeCIM with three hierarchies of pipelines: Intra-Macro, Near-Memory and Tile-Level. The Intra-Macro Pipeline parallelly executes data transfer and in-memory-computing (IMC) operations. The Near-Memory Pipeline alleviates memory access for pooling and data reshaping. The Tile-Level Pipeline establishes a layer-wise pipeline to further improve the throughput while reducing control complexity. PipeCIM introduces the nested scheme and a Unidirectional Divergent Connection Protocol (UDTCP) to simplify the control of data flow with the help of customized RISC-V instructions. To validate our design, PipeCIM was prototyped in 55 nm process node, achieving energy efficiency of 133.8 TOPS/W and peak throughput of 819 GOPS with a 16KB CIM array, which can accelerate VGG-16 to 128.56\times or Inception to 19.754\times compared to the baseline.
ISSN:	1549-8328 1558-0806
DOI:	10.1109/TCSI.2024.3384271