Loading…
Inference Time Reduction of Deep Neural Networks on Embedded Devices: A Case Study
From object detection to semantic segmentation, deep learning has achieved many groundbreaking results in recent years. However, due to the increasing complexity, the execution of neural networks on embedded platforms is greatly hindered. This has motivated the development of several neural network...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | From object detection to semantic segmentation, deep learning has achieved many groundbreaking results in recent years. However, due to the increasing complexity, the execution of neural networks on embedded platforms is greatly hindered. This has motivated the development of several neural network minimisation techniques, amongst which pruning has gained a lot of focus. In this work, we perform a case study on a series of methods with the goal of finding a small model that could run fast on embedded devices. First, we suggest a simple, but effective, ranking criterion for filter pruning called Mean Weight. Then, we combine this new criterion with a threshold-aware layer-sensitive filter pruning method, called T-sensitive pruning, to gain high accuracy. Further, the pruning algorithm follows a structured filter pruning approach that removes all selected filters and their dependencies from the DNN model, leading to less computations, and thus low inference time in lower-end CPUs. To validate the effectiveness of the proposed method, we perform experiments on three different datasets (with 3, 101, and 1000 classes) and two different deep neural networks (i.e., SICK-Net and MobileNet V1). We have obtained speedups of up to 13x on lower-end CPUs (Armv8) with less than 1% drop in accuracy. This satisfies the goal of transferring deep neural networks to embedded hardware while attaining a good trade-off between inference time and accuracy. |
---|---|
ISSN: | 2771-2508 |
DOI: | 10.1109/DSD57027.2022.00036 |