Loading…
Benchmarking Deep Learning Frameworks and Investigating FPGA Deployment for Traffic Sign Classification and Detection
We benchmark several widely-used deep learning frameworks and investigate the field programmable gate array (FPGA) deployment for performing traffic sign classification and detection. We evaluate the training speed and inference accuracy of these frameworks on the graphics processing unit (GPU) by t...
Saved in:
Published in: | IEEE transactions on intelligent vehicles 2019-09, Vol.4 (3), p.385-395 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We benchmark several widely-used deep learning frameworks and investigate the field programmable gate array (FPGA) deployment for performing traffic sign classification and detection. We evaluate the training speed and inference accuracy of these frameworks on the graphics processing unit (GPU) by training FPGA-deployment-suitable models with various input sizes on German Traffic Sign Recognition Benchmark (GTSRB), a traffic sign classification dataset. Then, selected trained classification models and various object detection models that we train on GTSRB's detection counterpart (i.e., German Traffic Sign Detection Benchmark) are evaluated with inference speed, accuracy, and FPGA power efficiency by varying different parameters such as floating-point precisions, batch sizes, etc. We discover that Neon and MXNet deliver the best training speed and classification accuracy on the GPU in general for all test cases, while TensorFlow is always among the frameworks with the highest inference accuracies. We observe that with the current OpenVINO release, the performance of lightweight models (e.g., MobileNet-v1-SSD, etc.) usually exceeds the requirement of real-time detection without losing much accuracy, while other models (e.g., VGG-SSD, ResNet-50-SSD) generally fail to do so. We also demonstrate that we can adjust the precision of bitstreams and the batch sizes to balance inference speed and accuracy of the applications deployed on the FPGA. Finally, we show that for all test cases, the FPGA always achieves higher power efficiency than the GPU. |
---|---|
ISSN: | 2379-8858 2379-8904 |
DOI: | 10.1109/TIV.2019.2919458 |