Loading…

Long Live TIME: Improving Lifetime and Security for NVM-Based Training-in-Memory Systems

Nonvolatile memory (NVM)-based training-in-memory (TIME) systems have emerged that can process the neural network (NN) training in an energy-efficient manner. However, the endurance of NVM cells is disappointing, rendering concerns about the lifetime of TIME systems, because the weights of NN models...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computer-aided design of integrated circuits and systems 2020-12, Vol.39 (12), p.4707-4720
Main Authors: Cai, Yi, Lin, Yujun, Xia, Lixue, Chen, Xiaoming, Han, Song, Wang, Yu, Yang, Huazhong
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Nonvolatile memory (NVM)-based training-in-memory (TIME) systems have emerged that can process the neural network (NN) training in an energy-efficient manner. However, the endurance of NVM cells is disappointing, rendering concerns about the lifetime of TIME systems, because the weights of NN models always need to be updated for thousands to millions of times during training. Gradient sparsification (GS) can alleviate this problem by preserving only a small portion of the gradients to update the weights. However, conventional GS will introduce nonuniform writes on different cells across the whole NVM crossbars, which significantly reduces the excepted available lifetime. Moreover, an adversary can easily launch malicious training tasks to exactly wear-out the target cells and fast break down the system. In this article, we propose an efficient and effective framework, referred as SGS-ARS, to improve the lifetime and security of TIME systems. The framework mainly contains a structured GS (SGS) scheme for reducing the write frequency, and an aging-aware row swapping (ARS) scheme to make the writes uniform. Meanwhile, we show that the back-propagation mechanism allows the attacker to localize and update fixed memory locations and wear them out. Therefore, we introduce Random-ARS and Refresh techniques to thwart adversarial training attacks, preventing the systems from being fast broken in an extremely short time. Our experiments show that when TIME is programmed to train ResNet-50 on ImageNet dataset, 356\times lifetime extension can be achieved without sacrificing the accuracy much or incurring much hardware overhead. Under the adversarial environment, the available lifetime of TIME systems can still be improved by 84\times .
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2020.2977079