Loading…

Benchmarking Deep Learning Frameworks and Investigating FPGA Deployment for Traffic Sign Classification and Detection

We benchmark several widely-used deep learning frameworks and investigate the field programmable gate array (FPGA) deployment for performing traffic sign classification and detection. We evaluate the training speed and inference accuracy of these frameworks on the graphics processing unit (GPU) by t...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on intelligent vehicles 2019-09, Vol.4 (3), p.385-395
Main Authors: Lin, Zhongyi, Yih, Matthew, Ota, Jeffrey M., Owens, John D., Muyan-Ozcelik, Pinar
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:We benchmark several widely-used deep learning frameworks and investigate the field programmable gate array (FPGA) deployment for performing traffic sign classification and detection. We evaluate the training speed and inference accuracy of these frameworks on the graphics processing unit (GPU) by training FPGA-deployment-suitable models with various input sizes on German Traffic Sign Recognition Benchmark (GTSRB), a traffic sign classification dataset. Then, selected trained classification models and various object detection models that we train on GTSRB's detection counterpart (i.e., German Traffic Sign Detection Benchmark) are evaluated with inference speed, accuracy, and FPGA power efficiency by varying different parameters such as floating-point precisions, batch sizes, etc. We discover that Neon and MXNet deliver the best training speed and classification accuracy on the GPU in general for all test cases, while TensorFlow is always among the frameworks with the highest inference accuracies. We observe that with the current OpenVINO release, the performance of lightweight models (e.g., MobileNet-v1-SSD, etc.) usually exceeds the requirement of real-time detection without losing much accuracy, while other models (e.g., VGG-SSD, ResNet-50-SSD) generally fail to do so. We also demonstrate that we can adjust the precision of bitstreams and the batch sizes to balance inference speed and accuracy of the applications deployed on the FPGA. Finally, we show that for all test cases, the FPGA always achieves higher power efficiency than the GPU.
ISSN:2379-8858
2379-8904
DOI:10.1109/TIV.2019.2919458