Loading…

An Efficient NPU-Aware Filter Pruning in Convolutional Neural Network

The neural processing unit (NPU)is a high-performance and low-power acceleration specialized in implementing artificial intelligence (AI) such as training and inference. The NPU needs a compressed network because it is used with low power and low latency to process the convolutional neural network (...

Full description

Saved in:
Bibliographic Details
Main Authors: Lee, Soyoung, Kim, Kyungho, Kwak, Jonghoon, Lee, EunChong, Lee, Sang-Seol
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The neural processing unit (NPU)is a high-performance and low-power acceleration specialized in implementing artificial intelligence (AI) such as training and inference. The NPU needs a compressed network because it is used with low power and low latency to process the convolutional neural network (CNN). Therefore, in this paper, we propose an efficient NPU-aware filter pruning method for CNN to increase the efficiency of NPU. NPU-aware filter pruning is performed in multiples of the channel unit size, which is the operation unit of the NPU to reduce unnecessary computation and save memory storage space. In the experimental results with VGGNet-16 and ResNet-18 on the CIFAR10 dataset, the proposed method reduced hardware inefficient space and unnecessary computation by 1.86~6.78% compared to general pruning method without loss of accuracy.
ISSN:2767-7699
DOI:10.1109/ICEIC57457.2023.10049954