Loading…

A Study of Structured Pruning for Hybrid Neural Networks

In this paper, we explore the impact of structure pruning on model compression. Structured pruning, which removes specific structures within the model such as entire neurons, channels, or filters in convolutional neural networks, targets particular elements for removal. This is distinct from weight...

Full description

Saved in:

Bibliographic Details
Main Authors:	Ghimire, Deepak, Kil, Dayoung, Kim, Seong-heum
Format:	Conference Proceeding
Language:	English
Subjects:	Architecture Filters Hybrid neural networks Inference algorithms Mobile handsets Mobile stems Neurons Object detection Runtime Structure pruning Training Training data Transformers
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we explore the impact of structure pruning on model compression. Structured pruning, which removes specific structures within the model such as entire neurons, channels, or filters in convolutional neural networks, targets particular elements for removal. This is distinct from weight pruning, which eliminates individual weights regardless of their location in the model. On top of our previous publications, the focus of this work is to study of reducing mobile stems in CNN-transformer architectures, building upon previous publications. Here, the mobile stems often make transformer architectures more efficient for deployment on mobile devices or other resource-constrained environments. Many pruning methods for mobile stems involve a sequential process of training, pruning, and fine-tuning stages. In contrast, our approach involves automatically selecting filter pruning criteria based on magnitude or similarity from a specified pool of criteria, adjusting the specific pruning layer in each iteration based on the network's overall loss on a small subset of training data. To alleviate sudden accuracy drops from pruning, the network undergoes brief retraining after reducing a predefined number of floating-point operations (FLOPs). Optimal pruning rates for each layer in mobile stems are automatically determined. Experiments on the VGGNet, ResNet, and MobileNet models using the CIFAR-10 and ImageNet benchmark datasets validate the effectiveness of the proposed method. Additionally, we discuss remaining tasks and ongoing research for the future.
ISSN:	2642-3901
DOI:	10.23919/ICCAS63016.2024.10773379