Loading…
The High-Performance Convolution Design and Implementation Using Parallel Memory Processing and Shift Register Pipeline
This paper addresses the hardware implementation of a CNN deep learning system, focusing on the method for implementing Convolution Filters, which are known to cause time bottlenecks due to data processing and computations. When applying a 3 × 3 or 5 × 5 filter to obtain a single output pixel's...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper addresses the hardware implementation of a CNN deep learning system, focusing on the method for implementing Convolution Filters, which are known to cause time bottlenecks due to data processing and computations. When applying a 3 × 3 or 5 × 5 filter to obtain a single output pixel's value in a Convolution Filter, it requires reading 9 or 25 data from memory. Furthermore, multiple clock cycles are needed for MAC (Multiply-Accumulate) processing on these data. Since memory can read only 1 or 2 data at a time, this results in numerous memory reads, ranging from several to tens of times. In this paper, a solution is presented to process Convolution Filters efficiently and cost-effectively using parallel memory processing techniques and a pipeline processing approach with shift registers. |
---|---|
ISSN: | 2767-7699 |
DOI: | 10.1109/ICEIC61013.2024.10457129 |