Loading…

The High-Performance Convolution Design and Implementation Using Parallel Memory Processing and Shift Register Pipeline

This paper addresses the hardware implementation of a CNN deep learning system, focusing on the method for implementing Convolution Filters, which are known to cause time bottlenecks due to data processing and computations. When applying a 3 × 3 or 5 × 5 filter to obtain a single output pixel's...

Full description

Saved in:

Bibliographic Details
Main Authors:	Baek, YoungSeok, Koo, BonTae
Format:	Conference Proceeding
Language:	English
Subjects:	CNN Convolution Convolution Filter Convolution Memory Deep learning FPGA Information filters Memory management Radar Radar detection RTL Shift registers Simulation
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper addresses the hardware implementation of a CNN deep learning system, focusing on the method for implementing Convolution Filters, which are known to cause time bottlenecks due to data processing and computations. When applying a 3 × 3 or 5 × 5 filter to obtain a single output pixel's value in a Convolution Filter, it requires reading 9 or 25 data from memory. Furthermore, multiple clock cycles are needed for MAC (Multiply-Accumulate) processing on these data. Since memory can read only 1 or 2 data at a time, this results in numerous memory reads, ranging from several to tens of times. In this paper, a solution is presented to process Convolution Filters efficiently and cost-effectively using parallel memory processing techniques and a pipeline processing approach with shift registers.
ISSN:	2767-7699
DOI:	10.1109/ICEIC61013.2024.10457129