Loading…

The High-Performance Convolution Design and Implementation Using Parallel Memory Processing and Shift Register Pipeline

This paper addresses the hardware implementation of a CNN deep learning system, focusing on the method for implementing Convolution Filters, which are known to cause time bottlenecks due to data processing and computations. When applying a 3 × 3 or 5 × 5 filter to obtain a single output pixel's...

Full description

Saved in:
Bibliographic Details
Main Authors: Baek, YoungSeok, Koo, BonTae
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper addresses the hardware implementation of a CNN deep learning system, focusing on the method for implementing Convolution Filters, which are known to cause time bottlenecks due to data processing and computations. When applying a 3 × 3 or 5 × 5 filter to obtain a single output pixel's value in a Convolution Filter, it requires reading 9 or 25 data from memory. Furthermore, multiple clock cycles are needed for MAC (Multiply-Accumulate) processing on these data. Since memory can read only 1 or 2 data at a time, this results in numerous memory reads, ranging from several to tens of times. In this paper, a solution is presented to process Convolution Filters efficiently and cost-effectively using parallel memory processing techniques and a pipeline processing approach with shift registers.
ISSN:2767-7699
DOI:10.1109/ICEIC61013.2024.10457129