Loading…

QUEST: A 7.49TOPS multi-purpose log-quantized DNN inference engine stacked on 96MB 3D SRAM using inductive-coupling technology in 40nm CMOS

A key consideration for deep neural network (DNN) inference accelerators is the need for large and high-bandwidth external memories. Although an architectural concept for stacking a DNN accelerator with DRAMs has been proposed previously, long DRAM latency remains problematic and limits the performa...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ueyoshi, Kodai, Ando, Kota, Hirose, Kazutoshi, Takamaeda-Yamazaki, Shinya, Kadomoto, Junichiro, Miyata, Tomoki, Hamada, Mototsugu, Kuroda, Tadahiro, Motomura, Masato
Format:	Conference Proceeding
Language:	English
Subjects:	Engines Memory management Random access memory Stacking System-on-chip Three-dimensional displays
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A key consideration for deep neural network (DNN) inference accelerators is the need for large and high-bandwidth external memories. Although an architectural concept for stacking a DNN accelerator with DRAMs has been proposed previously, long DRAM latency remains problematic and limits the performance [1]. Recent algorithm-level optimizations, such as network pruning and compression, have shown success in reducing the DNN memory size [2]; however, since networks become irregular and sparse, they induce an additional need for agile random accesses to the memory systems.
ISSN:	2376-8606
DOI:	10.1109/ISSCC.2018.8310261