Loading…
34.2 A 16nm 96Kb Integer/Floating-Point Dual-Mode-Gain-Cell-Computing-in-Memory Macro Achieving 73.3-163.3TOPS/W and 33.2-91.2TFLOPS/W for AI-Edge Devices
Advanced AI-edge chips require computational flexibility and high-energy efficiency (EEF) with sufficient inference accuracy for a variety of applications. Floating-point (FP) numerical representation can be used for complex neural networks (NN) requiring a high inference accuracy; however, such an...
Saved in:
Main Authors: | , , , , , , , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Advanced AI-edge chips require computational flexibility and high-energy efficiency (EEF) with sufficient inference accuracy for a variety of applications. Floating-point (FP) numerical representation can be used for complex neural networks (NN) requiring a high inference accuracy; however, such an approach requires higher energy and more parameter storage than does a fixed-point integer (INT) numerical representation. Many compute-in-memory (CIM) designs have a good EEF for INT multiply-and-accumulate (MAC) operations; however, few support FP-MAC operations [1-3]. Implementing INT/FP dual-mode (DM) MAC operations presents challenges (Fig. 34.2.1), including (1) low-area efficiency, since FP-MAC functions become idle during INT-MAC operations; (2) a high system-level latency, due to NN data update interruptions on small-capacity SRAM-CIM without concurrent write-and-compute functionality; and (3) high-energy consumption, due to repeated system-to-CIM data transfers during computation. This work presents an INT/FP DM macro featuring (1) a DM zone-based input (IN) processing scheme (ZB-IPS) to eliminate subtraction in exponent (EXP) computation, while reusing the alignment circuit in INT-mode to improve EEF and area efficiency (AEF); (2) a DM local-computing-cell (DM-LCC), which reuses the EXP addition as an adder tree stage for INT-MAC to improve AEF in INT mode; and (3) a stationary-based two-port gain-cell (GC) array (SB-TP-GCA) to support concurrent data updates and computation, while reducing system-to-CIM and internal data accesses to improve EEF and latency (T MAC ). A 16nm 96-Kb INT-FP DM GC-CIM macro with 4T GCs is fabricated to support FP-MAC with 64 accumulations (N ACCU ) for BF16-IN, BF16-W, and FP32-OUT as well as an INT-MAC with N ACCU =128 for 8b-IN, 8b-W, and 23b-OUT. This CIM macro achieves a 163.3TOPS/W INT-MAC and a 91.2TFLOPS/W FP-MAC EEF. |
---|---|
ISSN: | 2376-8606 |
DOI: | 10.1109/ISSCC49657.2024.10454447 |