Loading…

QUEST: Multi-Purpose Log-Quantized DNN Inference Engine Stacked on 96-MB 3-D SRAM Using Inductive Coupling Technology in 40-nm CMOS

QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an inductive coupling technology called the Thru...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE journal of solid-state circuits 2019-01, Vol.54 (1), p.186-196
Main Authors:	Ueyoshi, Kodai, Ando, Kota, Hirose, Kazutoshi, Takamaeda-Yamazaki, Shinya, Hamada, Mototsugu, Kuroda, Tadahiro, Motomura, Masato
Format:	Article
Language:	English
Subjects:	Accelerator Artificial neural networks Bandwidth CMOS Couplings deep learning deep neural networks (DNNs) Hardware Inductive coupling logarithmic-quantized neural networks Neural networks processor architecture Quantization (signal) Random access memory Stacking State of the art Static random access memory Technology utilization
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	QUEST is a programmable multiple instruction, multiple data (MIMD) parallel accelerator for general-purpose state-of-the-art deep neural networks (DNNs). It features die-to-die stacking with three-cycle latency, 28.8 GB/s, 96 MB, and eight SRAMs using an inductive coupling technology called the ThruChip interface (TCI). By stacking the SRAMs instead of DRAMs, lower memory access latency and simpler hardware are expected. This facilitates in balancing the memory capacity, latency, and bandwidth, all of which are in demand by cutting-edge DNNs at a high level. QUEST also introduces log-quantized programmable bit-precision processing for achieving faster (larger) DNN computation (size) in a 3-D module. It can sustain higher recognition accuracy at a lower bitwidth region compared to linear quantization. The prototype QUEST chip is integrated in the 40-nm CMOS technology, and it achieves 7.49 tera operations per second (TOPS) peak performance in binary precision, and 1.96 TOPS in 4-bit precision at 300-MHz clock.
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2018.2871623