Loading…

Chameleon: Dual Memory Replay for Online Continual Learning on Edge Devices

Once deployed on edge devices, a deep neural network model should dynamically adapt to newly discovered environments and personalize its utility for each user. The system must be capable of continual learning (CL), i.e., learning new information from a temporal stream of data in situ without forgett...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computer-aided design of integrated circuits and systems 2024-06, Vol.43 (6), p.1663-1676
Main Authors: Aggarwal, Shivam, Binici, Kuluhan, Mitra, Tulika
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Once deployed on edge devices, a deep neural network model should dynamically adapt to newly discovered environments and personalize its utility for each user. The system must be capable of continual learning (CL), i.e., learning new information from a temporal stream of data in situ without forgetting previously acquired knowledge. However, creating a personalized CL framework poses significant challenges due to limited compute and storage resources on edge devices. Existing methods rely on large memory storage to preserve past data while learning from incoming streams, making them impractical for such devices. In this article, we propose Chameleon as a hardware-friendly CL solution for user-centric CL with dual replay buffers. The strategy takes advantage of the hierarchical memory structure commonly found in edge devices, utilizing a short-term replay store in on-chip memory and a long-term replay store in off-chip memory. We also present an FPGA-based analytical model to estimate the compute and communication costs of the dual replay strategy on the hardware, making effective design choices considering various latent layer options. We conduct extensive experiments on four different models, demonstrating our method's consistent performance across diverse model architectures. Our method achieves up to 7\times speedup and improved energy efficiency on popular edge devices, including ZCU102 FPGA, NVIDIA Jetson Nano, and Google's EdgeTPU. Our code is available at https://github.com/ecolab-nus/Chameleon .
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2023.3347640