Loading…

Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model

Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic pro...

Full description

Saved in:
Bibliographic Details
Main Authors: Wu, Cheng-Hung, Hong, Ding-Yong, Liu, Pangfeng, Wu, Jan-Jan
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic programming algorithm to find a good sparsity ratio for every layer under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio of each layer, we leverage the auto-tuner of the TVM compiler to search for an optimization schedule of the pruned convolution to accelerate fine-grained pruned models. Experimental results show that our scheme can achieve 0.35% accuracy improvement and a 1.55Ă— speedup on VGG-16 with ImageNet than the dense model.
ISSN:2690-5965
DOI:10.1109/ICPADS60453.2023.00398