Loading…
A high performance hardware accelerator for dynamic texture segmentation
•The major contribution of this paper is the development of a hardware (FPGA) software (CPU) co-design architecture for accelerating the application of Dynamic Texture Segmentation.•This work presents a FPGA implementation of FFT processing sub-system including FFT/IFFT processors, read/write contro...
Saved in:
Published in: | Journal of systems architecture 2015-11, Vol.61 (10), p.639-645 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •The major contribution of this paper is the development of a hardware (FPGA) software (CPU) co-design architecture for accelerating the application of Dynamic Texture Segmentation.•This work presents a FPGA implementation of FFT processing sub-system including FFT/IFFT processors, read/write control modules, and memory/FIFO modules, as well as memory optimization using local FIFOs to minimize external RAM access. Such FPGA implementation fully exploits the hardware acceleration technique. All FFT and related operations are executed in hardware which should run orders of magnitude faster than the software implementation.•This paper demonstrates that the FPGA-CPU based solution is 37.3 times faster in data processing time and 5.9 times faster in total run time, compared to the CPU (CPU–GPU) based solution.
Hardware accelerators such as general-purpose GPUs and FPGAs have been used as an alternative to conventional CPU architectures in scientific computing applications, and have achieved good speed-up results. Within this context, the present study presents a heterogeneous architecture for high-performance computing based on CPUs and FPGAs, which efficiently explores the maximum parallelism degree for processing video segmentation using the concept of dynamic textures. The video segmentation algorithm includes processing the 3-D FFT, calculating the phase spectrum and the 2-D IFFT operation. The performance of the proposed architecture based on CPU and FPGA is compared with the reference implementation of FFTW in CPU and with the cuFFT library in GPU. The performance report of the prototyped architecture in a single Stratix IV FPGA obtained an overall speedup of 37x over the FFTW software library. |
---|---|
ISSN: | 1383-7621 1873-6165 |
DOI: | 10.1016/j.sysarc.2015.09.005 |