Loading…
A 28-nm Computing-in-Memory-Based Super-Resolution Accelerator Incorporating Macro-Level Pipeline and Texture/Algebraic Sparsity
Super-resolution (SR) task using the convolutional neural network is a crucial task in improving image and video quality. The introduction of the residual block (RB) raises the depth of the algorithm to perform better reconstruction. The processing of the RB leads to a decrease in hardware utilizati...
Saved in:
Published in: | IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2024-02, Vol.71 (2), p.1-14 |
---|---|
Main Authors: | , , , , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Super-resolution (SR) task using the convolutional neural network is a crucial task in improving image and video quality. The introduction of the residual block (RB) raises the depth of the algorithm to perform better reconstruction. The processing of the RB leads to a decrease in hardware utilization and frequent off-chip communications. It is hard to apply such algorithms on edge devices with limited performance. Computing-in-memory (CiM) is one promising method to reduce high power caused by massive data movement in multiply-accumulation computation. The algebraic sparsity (AS) is the structured sparsity (SS) optimization for imaging computing. However, it is an unsolved problem to simultaneously realize the texture sparsity (TS) of the image and the SS of the algorithm in the CiM scheme while maintaining high hardware utilization. Thus, we propose a CiM-based SR task accelerator. There are three key contributions: first, a texture-aware workflow and a dynamic grouping CiM engine can concurrently support TS coupling with AS. Second, a macro-level pipeline scheme together with two custom-sized CiM macros and a high reuse-rate Hadamard transformation circuit reaches 91% hardware utilization. Third, a novel weight update strategy is devised to reduce the performance loss induced by the weight updating. The accelerator prototype is fabricated in a 28-nm CMOS. It scores a 22.8-44.3-TOPS/W peak energy efficiency at the voltage supply of 0.54-1.1 V and the operating frequency of 50-200 MHz, indicating 1.8-6.8x higher compared to the state-of-the-art CiM processors. |
---|---|
ISSN: | 1549-8328 1558-0806 |
DOI: | 10.1109/TCSI.2023.3325850 |