Loading…

A Floating-Point 6T SRAM In-Memory-Compute Macro Using Hybrid-Domain Structure for Advanced AI Edge Chips

Advanced artificial intelligence edge devices are expected to support floating-point (FP) multiply and accumulation operations while ensuring high energy efficiency and high inference accuracy. This work presents an FP compute-in-memory (CIM) macro that exploits the advantages of computing in the ti...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE journal of solid-state circuits 2024-01, Vol.59 (1), p.196-207
Main Authors:	Wu, Ping-Chun, Su, Jian-Wei, Hong, Li-Yang, Ren, Jin-Sheng, Chien, Chih-Han, Chen, Ho-Yu, Ke, Chao-En, Hsiao, Hsu-Ming, Li, Sih-Han, Sheu, Shyh-Shyuan, Lo, Wei-Chung, Chang, Shih-Chieh, Lo, Chung-Chuan, Liu, Ren-Shuo, Hsieh, Chih-Cheng, Tang, Kea-Tiong, Chang, Meng-Fan
Format:	Article
Language:	English
Subjects:	Accumulation Accuracy Artificial intelligence Artificial intelligence (AI) Common Information Model (computing) Computational efficiency compute-in-memory (CIM) Delays Energy efficiency Floating point arithmetic floating-point (FP) inference Macrostructure Memory management Pipelines Random access memory Static random access memory static random access memory (SRAM) Time-domain analysis
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Advanced artificial intelligence edge devices are expected to support floating-point (FP) multiply and accumulation operations while ensuring high energy efficiency and high inference accuracy. This work presents an FP compute-in-memory (CIM) macro that exploits the advantages of computing in the time, digital, and analog-voltage domain for high energy efficiency and accuracy. This work employs: 1) a hybrid-domain macrostructure to enable the computation of both the exponent and mantissa within the same CIM macro; 2) a time-domain computing scheme for energy-efficient exponent computation; 3) a product-exponent-based input-mantissa alignment scheme to enable the accumulation of the product mantissa in the same column; and 4) a place-value-dependent digital-analog-hybrid computing scheme to enable energy-efficient mantissa computations of sufficient accuracy. A 22-nm 832-kB FP-CIM macro fabricated using foundry-provided compact 6T-static random access memory (SRAM) cells achieved a high energy efficiency of 72.14 tera-floating-point operations per second (TFLOPS)/W while performing FP-multiply-and-accumulate (MAC) operations involving BF16-input, BF16-weight, FP32-output, and 128 accumulations.
ISSN:	0018-9200 1558-173X
DOI:	10.1109/JSSC.2023.3309966