Loading…

AntiDoteX: Attention-Based Dynamic Optimization for Neural Network Runtime Efficiency

Deep neural networks (DNNs) achieved great cognitive performance at the expense of a considerable computation workload. To relieve the computational burden, many optimization works are developed to reduce the model redundancy by identifying and removing insignificant model components, such as weight...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on computer-aided design of integrated circuits and systems 2022-11, Vol.41 (11), p.4694-4707
Main Authors: Yu, Fuxun, Xu, Zirui, Liu, Chenchen, Stamoulis, Dimitrios, Wang, Di, Wang, Yanzhi, Chen, Xiang
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Deep neural networks (DNNs) achieved great cognitive performance at the expense of a considerable computation workload. To relieve the computational burden, many optimization works are developed to reduce the model redundancy by identifying and removing insignificant model components, such as weight sparsity and filter pruning methods. However, these works only evaluate model components' static significance with parameter information, ignoring their dynamic interaction with external inputs. Specifically, due to the difference in per-input features, the model components' significance can dynamically change and, thus, the static methods can only achieve suboptimal performance. Focusing on this aspect, we propose a dynamic DNN optimization framework in this work. Based on the neural network attention mechanism, we propose a comprehensive dynamic optimization framework, including 1) testing-phase dynamic feature map pruning; 2) training-phase optimization by training with targeted dropout; and 3) deployment-phase one-for-all (OFA) model adaptability enhancement. By providing a holistic dynamic testing, training, and deployment co-optimization framework, our work has the following benefits: first, it can accurately identify and aggressively remove per-input feature redundancy by considering the model-input interaction and involving the channel/column-wise pruning flexibility; meanwhile, the training-testing co-optimization favors the dynamic pruning and helps maintain the model accuracy even with a very high feature pruning ratio. Finally, the deployment enhancement provides one unified OFA model to support full-spectrum feature sparsity ratios. The unified model can be dynamically reconfigured to meet different resource budgets without any retraining cost, and thus provide significant deployment flexibility. Extensive experiments show that our method could bring 37.4%-54.5% floating-point operations reduction with negligible accuracy drop on various test benchmarks. Meanwhile, the OFA deployment optimization enables us to use one model to support at most ten different resource constraints without any retraining cost.
ISSN:0278-0070
1937-4151
DOI:10.1109/TCAD.2022.3144616