Loading…
A High-Performance and Ultra-Low-Power Accelerator Design for Advanced Deep Learning Algorithms on an FPGA
This article addresses the growing need in resource-constrained edge computing scenarios for energy-efficient convolutional neural network (CNN) accelerators on mobile Field-Programmable Gate Array (FPGA) systems. In particular, we concentrate on register transfer level (RTL) design flow optimizatio...
Saved in:
Published in: | Electronics (Basel) 2024-07, Vol.13 (13), p.2676 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This article addresses the growing need in resource-constrained edge computing scenarios for energy-efficient convolutional neural network (CNN) accelerators on mobile Field-Programmable Gate Array (FPGA) systems. In particular, we concentrate on register transfer level (RTL) design flow optimization to improve programming speed and power efficiency. We present a re-configurable accelerator design optimized for CNN-based object-detection applications, especially suitable for mobile FPGA platforms like the Xilinx PYNQ-Z2. By not only optimizing the MAC module using Enhanced clock gating (ECG), the accelerator can also use low-power techniques such as Local explicit clock gating (LECG) and Local explicit clock enable (LECE) in memory modules to efficiently minimize data access and memory utilization. The evaluation using ResNet-20 trained on the CIFAR-10 dataset demonstrated significant improvements in power efficiency consumption (up to 22%) and performance. The findings highlight the importance of using different optimization techniques across multiple hardware modules to achieve better results in real-world applications. |
---|---|
ISSN: | 2079-9292 2079-9292 |
DOI: | 10.3390/electronics13132676 |