Loading…

Real-Time Block-Based Embedded CNN for Gesture Classification on an FPGA

This paper presents a block-based embedded convolutional neural network (CNN) for gesture classification on field-programmable gate array (FPGA) in real time. Gesture recognition is an important tool to spontaneous interact with human machine interface. Many CNN architectures using RGB images have b...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems. I, Regular papers Regular papers, 2021-10, Vol.68 (10), p.4182-4193
Main Authors: Wang, Ching-Chen, Ding, Yu-Chun, Chiu, Ching-Te, Huang, Chao-Tsung, Cheng, Yen-Yu, Sun, Shih-Yi, Cheng, Chih-Han, Kuo, Hsueh-Kai
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This paper presents a block-based embedded convolutional neural network (CNN) for gesture classification on field-programmable gate array (FPGA) in real time. Gesture recognition is an important tool to spontaneous interact with human machine interface. Many CNN architectures using RGB images have been proposed for gesture classification. RGB based gesture classification may cause incorrect results under insufficient light or similar gestures. In addition, most of the CNN architectures cannot run in real time on edge devices due to their large number of parameters and DRAM data access. In this paper, a block-based CNN using RGB-D data is proposed for gesture classification. Adding depth images to RGB images boots the classification accuracy. A CNN architecture with block-based feature maps is built for embedded FPGA implementations. The total number of parameters of the proposed RGB-D embedded CNN (eCNN) model is only 0.17M and it achieves 99.96% and 99.88% accuracy with 32-bit floating point and 8-bit fixed point implementation for America Sign Language (ASL) data set. The RTL simulation of the proposed eCNN model has the average inference speed of 0.171 milliseconds at frequency of 250MHz for a single pair RGB-D image. Implemented on a FPGA integrated with Microsoft Kinect v2 achieve an inference time in 19.42 ms which achieves high accuracy and real-time performance.
ISSN:1549-8328
1558-0806
DOI:10.1109/TCSI.2021.3100109