Loading…

CASH-RAM: Enabling In-Memory Computations for Edge Inference Using Charge Accumulation and Sharing in Standard 8T-SRAM Arrays

Machine Learning (ML) workloads being memory- and compute-intensive, consume large amounts of power running on conventional computing systems, restricting their implementations to large-scale data centers. Transferring large amounts of data from the edge devices to the data centers is not only energ...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE journal on emerging and selected topics in circuits and systems 2020-09, Vol.10 (3), p.295-305
Main Authors:	Agrawal, Amogh, Kosta, Adarsh, Kodge, Sangamesh, Kim, Dong Eun, Roy, Kaushik
Format:	Article
Language:	English
Subjects:	Acceleration Accumulation Accuracy Arrays charge-sharing Computation Computer centers Data centers edge intelligence Electronic devices In-memory computing Machine learning Neural networks Parasitic capacitance Parasitics (electronics) Power consumption Random access memory Self compensation Static random access memory static random access memory (SRAM)
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Machine Learning (ML) workloads being memory- and compute-intensive, consume large amounts of power running on conventional computing systems, restricting their implementations to large-scale data centers. Transferring large amounts of data from the edge devices to the data centers is not only energy expensive, but sometimes undesirable in security-critical applications. Thus, there is a need for building domain-specific hardware primitives for energy-efficient ML processing at the edge. One such approach - in-memory computing , eliminates frequent and unnecessary data-transfers between the memory and the compute units, by directly computing the data where it is stored. However, the analog nature of computations introduces non-idealities, which degrades the overall accuracy of neural networks. In this paper, we propose an in-memory computing primitive for accelerating dot-products within standard 8T-SRAM caches, using charge-sharing. The inherent parasitic capacitance of the bitlines and sourcelines is used for accumulating analog voltages, which can be sensed for an approximate dot product. The charge sharing approach involves a self-compensation technique which reduces the effects of non-idealities, thereby reducing the errors. Our results for ternary weight neural networks show that using the proposed compensation approaches, the accuracy degradation is within 1% and 5% of the baseline accuracy, for the MNIST and CIFAR-10 dataset, respectively, with an energy-delay product improvement of 38\times over the standard von-Neumann computing system. We believe that this work can be used in conjunction with existing mitigation techniques, such as re-training approaches, to further enhance system performance.
ISSN:	2156-3357 2156-3365
DOI:	10.1109/JETCAS.2020.3014250