Loading…
A 28nm 16.9-300TOPS/W Computing-in-Memory Processor Supporting Floating-Point NN Inference/Training with Intensive-CIM Sparse-Digital Architecture
Computing-in-memory (CIM) has shown high energy efficiency on low-precision integer multiply-accumulate (MAC) [1-3]. However, implementing floating-point (FP) operations using CIM has not been thoroughly explored. Previous FP CIM chips [4-5] require either complex in-memory FP logic or have lengthy...
Saved in:
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Computing-in-memory (CIM) has shown high energy efficiency on low-precision integer multiply-accumulate (MAC) [1-3]. However, implementing floating-point (FP) operations using CIM has not been thoroughly explored. Previous FP CIM chips [4-5] require either complex in-memory FP logic or have lengthy alignment-cycle latencies arising from converting FP data having different exponents into integer data. The challenges for an energy-efficient and accurate FP CIM processor are shown in Fig. 16.3.1. Firstly, aligning an FP vector onto a CIM module requires a long bit-serial sequence due to infrequent but long tail values, incurring many CIM cycles. In this work, we observe that most exponents of FP data are clustered in a small range, which motivates dividing FP operations into high-efficiency intensive-CIM and flexible sparse-digital parts. Secondly, to implement the intensive-CIM + sparse-digital FP workflow, a sparse digital core is required for flexible intensive/sparse processing. Thirdly, the FP alignment brings more random sparsity. Though analog CIM can utilize random sparsity with a low-resolution ADC, the corresponding sparse strategy for digital CIM has not been explored. |
---|---|
ISSN: | 2376-8606 |
DOI: | 10.1109/ISSCC42615.2023.10067779 |