Loading…

An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks

This article proposes an uninterrupted processing technique for the convolutional neural network (CNN) accelerator. It primarily allows the CNN accelerator to simultaneously perform both processing element (PE) operation and data fetching that reduces its latency and enhances the achievable throughp...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on very large scale integration (VLSI) systems 2022-12, Vol.30 (12), p.1-11
Main Authors: Islam, Md Najrul, Shrestha, Rahul, Chowdhury, Shubhajit Roy
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This article proposes an uninterrupted processing technique for the convolutional neural network (CNN) accelerator. It primarily allows the CNN accelerator to simultaneously perform both processing element (PE) operation and data fetching that reduces its latency and enhances the achievable throughput. Corresponding to the suggested technique, this work also presents a low latency VLSI-architecture of the CNN accelerator using the new random access line-buffer (RALB)-based design of PE array. Subsequently, the proposed CNN-accelerator architecture has been further optimized by reusing the local data in PE array, incurring better energy conservation. Our CNN accelerator has been hardware implemented on Zynq-UltraScale + MPSoC-ZCU102 FPGA board, and it operates at a maximum clock frequency of 340 MHz, consuming 4.11 W of total power. In addition, the suggested CNN accelerator with 864 PEs delivers a peak throughput of 587.52 GOPs and an adequate energy efficiency of 142.95 GOPs/W. Comparison of aforementioned implementation results with the literature has shown that our CNN accelerator delivers 33.42% higher throughput and 6.24 \times better energy efficiency than the state-of-the-art work. Eventually, the field-programmable gate array (FPGA) prototype of the proposed CNN accelerator has been functionally validated using the real-world test setup for the detection of object from input image, using the GoogLeNet neural network.
ISSN:1063-8210
1557-9999
DOI:10.1109/TVLSI.2022.3210963