Loading…

Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model

Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic pro...

Full description

Saved in:

Bibliographic Details
Main Authors:	Wu, Cheng-Hung, Hong, Ding-Yong, Liu, Pangfeng, Wu, Jan-Jan
Format:	Conference Proceeding
Language:	English
Subjects:	Computational modeling Convolution deep neural network Dynamic programming Heuristic algorithms Inference algorithms Neural networks Schedules weight pruning
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	2836
container_issue
container_start_page	2835
container_title
container_volume
creator	Wu, Cheng-Hung Hong, Ding-Yong Liu, Pangfeng Wu, Jan-Jan
description	Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic programming algorithm to find a good sparsity ratio for every layer under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio of each layer, we leverage the auto-tuner of the TVM compiler to search for an optimization schedule of the pruned convolution to accelerate fine-grained pruned models. Experimental results show that our scheme can achieve 0.35% accuracy improvement and a 1.55× speedup on VGG-16 with ImageNet than the dense model.
doi_str_mv	10.1109/ICPADS60453.2023.00398
format	conference_proceeding
fullrecord	<record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10475913</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10475913</ieee_id><sourcerecordid>10475913</sourcerecordid><originalsourceid>FETCH-LOGICAL-i119t-25e7392569af7f1283888c1185e596d12b60747c74c662d36aac9a8584da4ce63</originalsourceid><addsrcrecordid>eNotjl1LwzAYRqMgOOf-gUj-QOebpPm6HHWbhbkNptcjpm8kUtORtaD_3openQcOHB5C7hnMGQP7UFf7xeNBQSnFnAMXcwBhzQWZWW2NkCAEaKYvyYQrC4W0Sl6Tm_P5A4DDqCdkt_w6tV3sY3qnq5iwWGc3oqGHPg--H_I493lIvz50mS5DiD5i6mmdAmZMHmmXaLXd0ueuwfaWXAXXnnH2zyl5XS1fqqdis1vX1WJTRMZsX3CJWlgulXVBB8aNMMZ4xozE8WTD-JsCXWqvS68Ub4RyzltnpCkbV3pUYkru_roREY-nHD9d_j4yKLW0TIgfDylO2g</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model</title><source>IEEE Xplore All Conference Series</source><creator>Wu, Cheng-Hung ; Hong, Ding-Yong ; Liu, Pangfeng ; Wu, Jan-Jan</creator><creatorcontrib>Wu, Cheng-Hung ; Hong, Ding-Yong ; Liu, Pangfeng ; Wu, Jan-Jan</creatorcontrib><description>Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic programming algorithm to find a good sparsity ratio for every layer under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio of each layer, we leverage the auto-tuner of the TVM compiler to search for an optimization schedule of the pruned convolution to accelerate fine-grained pruned models. Experimental results show that our scheme can achieve 0.35% accuracy improvement and a 1.55× speedup on VGG-16 with ImageNet than the dense model.</description><identifier>EISSN: 2690-5965</identifier><identifier>EISBN: 9798350330717</identifier><identifier>DOI: 10.1109/ICPADS60453.2023.00398</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Computational modeling ; Convolution ; deep neural network ; Dynamic programming ; Heuristic algorithms ; Inference algorithms ; Neural networks ; Schedules ; weight pruning</subject><ispartof>2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), 2023, p.2835-2836</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10475913$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,27904,54533,54910</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10475913$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Wu, Cheng-Hung</creatorcontrib><creatorcontrib>Hong, Ding-Yong</creatorcontrib><creatorcontrib>Liu, Pangfeng</creatorcontrib><creatorcontrib>Wu, Jan-Jan</creatorcontrib><title>Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model</title><title>2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)</title><addtitle>ICPADS</addtitle><description>Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic programming algorithm to find a good sparsity ratio for every layer under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio of each layer, we leverage the auto-tuner of the TVM compiler to search for an optimization schedule of the pruned convolution to accelerate fine-grained pruned models. Experimental results show that our scheme can achieve 0.35% accuracy improvement and a 1.55× speedup on VGG-16 with ImageNet than the dense model.</description><subject>Computational modeling</subject><subject>Convolution</subject><subject>deep neural network</subject><subject>Dynamic programming</subject><subject>Heuristic algorithms</subject><subject>Inference algorithms</subject><subject>Neural networks</subject><subject>Schedules</subject><subject>weight pruning</subject><issn>2690-5965</issn><isbn>9798350330717</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjl1LwzAYRqMgOOf-gUj-QOebpPm6HHWbhbkNptcjpm8kUtORtaD_3openQcOHB5C7hnMGQP7UFf7xeNBQSnFnAMXcwBhzQWZWW2NkCAEaKYvyYQrC4W0Sl6Tm_P5A4DDqCdkt_w6tV3sY3qnq5iwWGc3oqGHPg--H_I493lIvz50mS5DiD5i6mmdAmZMHmmXaLXd0ueuwfaWXAXXnnH2zyl5XS1fqqdis1vX1WJTRMZsX3CJWlgulXVBB8aNMMZ4xozE8WTD-JsCXWqvS68Ub4RyzltnpCkbV3pUYkru_roREY-nHD9d_j4yKLW0TIgfDylO2g</recordid><startdate>20231217</startdate><enddate>20231217</enddate><creator>Wu, Cheng-Hung</creator><creator>Hong, Ding-Yong</creator><creator>Liu, Pangfeng</creator><creator>Wu, Jan-Jan</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>20231217</creationdate><title>Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model</title><author>Wu, Cheng-Hung ; Hong, Ding-Yong ; Liu, Pangfeng ; Wu, Jan-Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i119t-25e7392569af7f1283888c1185e596d12b60747c74c662d36aac9a8584da4ce63</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computational modeling</topic><topic>Convolution</topic><topic>deep neural network</topic><topic>Dynamic programming</topic><topic>Heuristic algorithms</topic><topic>Inference algorithms</topic><topic>Neural networks</topic><topic>Schedules</topic><topic>weight pruning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Cheng-Hung</creatorcontrib><creatorcontrib>Hong, Ding-Yong</creatorcontrib><creatorcontrib>Liu, Pangfeng</creatorcontrib><creatorcontrib>Wu, Jan-Jan</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Cheng-Hung</au><au>Hong, Ding-Yong</au><au>Liu, Pangfeng</au><au>Wu, Jan-Jan</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model</atitle><btitle>2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)</btitle><stitle>ICPADS</stitle><date>2023-12-17</date><risdate>2023</risdate><spage>2835</spage><epage>2836</epage><pages>2835-2836</pages><eissn>2690-5965</eissn><eisbn>9798350330717</eisbn><coden>IEEPAD</coden><abstract>Weight pruning is a technique to remove redundant or unimportant weights from the network. It can help reduce the size and computational cost of neural networks while preserving their accuracy. In this paper, we aim to design efficient CNN models with N:M pruning on the CPU. We propose a dynamic programming algorithm to find a good sparsity ratio for every layer under a total time budget based on the execution times and L1 norm of layers. After deciding the sparsity ratio of each layer, we leverage the auto-tuner of the TVM compiler to search for an optimization schedule of the pruned convolution to accelerate fine-grained pruned models. Experimental results show that our scheme can achieve 0.35% accuracy improvement and a 1.55× speedup on VGG-16 with ImageNet than the dense model.</abstract><pub>IEEE</pub><doi>10.1109/ICPADS60453.2023.00398</doi><tpages>2</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	EISSN: 2690-5965
ispartof	2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS), 2023, p.2835-2836
issn	2690-5965
language	eng
recordid	cdi_ieee_primary_10475913
source	IEEE Xplore All Conference Series
subjects	Computational modeling Convolution deep neural network Dynamic programming Heuristic algorithms Inference algorithms Neural networks Schedules weight pruning
title	Exploiting Fine-Grained Structured Pruning for Efficient Inference on CNN Model
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T22%3A25%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Exploiting%20Fine-Grained%20Structured%20Pruning%20for%20Efficient%20Inference%20on%20CNN%20Model&rft.btitle=2023%20IEEE%2029th%20International%20Conference%20on%20Parallel%20and%20Distributed%20Systems%20(ICPADS)&rft.au=Wu,%20Cheng-Hung&rft.date=2023-12-17&rft.spage=2835&rft.epage=2836&rft.pages=2835-2836&rft.eissn=2690-5965&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICPADS60453.2023.00398&rft.eisbn=9798350330717&rft_dat=%3Cieee_CHZPO%3E10475913%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i119t-25e7392569af7f1283888c1185e596d12b60747c74c662d36aac9a8584da4ce63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10475913&rfr_iscdi=true