Loading…

PrSpMV: An Efficient Predictable Kernel for SpMV

Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns c...

Full description

Saved in:
Bibliographic Details
Main Authors: Fu, Gelin, Xia, Tian, Qu, Shaoru, Luo, Zhongpei, Li, Shuyu, Cheng, Pengyu, Guo, Runfan, Ding, Yitong, Ren, Pengju
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 456
container_issue
container_start_page 448
container_title
container_volume
creator Fu, Gelin
Xia, Tian
Qu, Shaoru
Luo, Zhongpei
Li, Shuyu
Cheng, Pengyu
Guo, Runfan
Ding, Yitong
Ren, Pengju
description Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose a novel SpMV kernel named PrSpMV to improve its performance by pursuing high predictability. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve pipeline efficiency by creating regular branch patterns to make branch prediction more accurate. Experiment results show that using the above optimization approaches, PrSpMV can eliminate nearly all branch mispredictions. Moreover, it can also significantly reduce the average L2 cache miss rate from 57% to 20% via efficiently leveraging hardware prefetchers. By using PrSpMV, stride prefetcher can be boosted with 1.31× speedup and dedicated irregular prefetcher can be improved with 1.40× speedup. Meanwhile, on commercial high-end Intel processors, it achieves 1.32× speedup against some state-of-the-art SpMV kernels.
doi_str_mv 10.1109/ICCD58817.2023.00075
format conference_proceeding
fullrecord <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10361035</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10361035</ieee_id><sourcerecordid>10361035</sourcerecordid><originalsourceid>FETCH-LOGICAL-i204t-64bd3ad485eb4496a2399a79f527353cb66c8c69b4d09694e9d203f94f6e038e3</originalsourceid><addsrcrecordid>eNotjMtKA0EQAEdBMMb8QQ7zA7v2TM-rvYU1xmDEgI9r2EcPjKxrmN2Lf29ED0VdihJiqaBUCuhmW1V3NgTlSw0aSwDw9kwsyFNAC2g0qXAuZtp6VzgidymuxvHjlAVUfiZgn1-OT--3cjXIdYypTTxMcp-5S-1UNz3LR84D9zJ-ZflbXouLWPcjL_49F2_369fqodg9b7bValckDWYqnGk6rDsTLDfGkKs1EtWeotUeLbaNc21oHTWmA3JkmDoNGMlEx4CBcS6Wf9_EzIdjTp91_j4oQHfC4g-EdEKB</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>PrSpMV: An Efficient Predictable Kernel for SpMV</title><source>IEEE Xplore All Conference Series</source><creator>Fu, Gelin ; Xia, Tian ; Qu, Shaoru ; Luo, Zhongpei ; Li, Shuyu ; Cheng, Pengyu ; Guo, Runfan ; Ding, Yitong ; Ren, Pengju</creator><creatorcontrib>Fu, Gelin ; Xia, Tian ; Qu, Shaoru ; Luo, Zhongpei ; Li, Shuyu ; Cheng, Pengyu ; Guo, Runfan ; Ding, Yitong ; Ren, Pengju</creatorcontrib><description>Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose a novel SpMV kernel named PrSpMV to improve its performance by pursuing high predictability. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve pipeline efficiency by creating regular branch patterns to make branch prediction more accurate. Experiment results show that using the above optimization approaches, PrSpMV can eliminate nearly all branch mispredictions. Moreover, it can also significantly reduce the average L2 cache miss rate from 57% to 20% via efficiently leveraging hardware prefetchers. By using PrSpMV, stride prefetcher can be boosted with 1.31× speedup and dedicated irregular prefetcher can be improved with 1.40× speedup. Meanwhile, on commercial high-end Intel processors, it achieves 1.32× speedup against some state-of-the-art SpMV kernels.</description><identifier>EISSN: 2576-6996</identifier><identifier>EISBN: 9798350342918</identifier><identifier>DOI: 10.1109/ICCD58817.2023.00075</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Branch Prediction ; Data Prefetching ; Sparse Matrix Computation</subject><ispartof>2023 IEEE 41st International Conference on Computer Design (ICCD), 2023, p.448-456</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10361035$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27924,54554,54931</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10361035$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fu, Gelin</creatorcontrib><creatorcontrib>Xia, Tian</creatorcontrib><creatorcontrib>Qu, Shaoru</creatorcontrib><creatorcontrib>Luo, Zhongpei</creatorcontrib><creatorcontrib>Li, Shuyu</creatorcontrib><creatorcontrib>Cheng, Pengyu</creatorcontrib><creatorcontrib>Guo, Runfan</creatorcontrib><creatorcontrib>Ding, Yitong</creatorcontrib><creatorcontrib>Ren, Pengju</creatorcontrib><title>PrSpMV: An Efficient Predictable Kernel for SpMV</title><title>2023 IEEE 41st International Conference on Computer Design (ICCD)</title><addtitle>ICCD</addtitle><description>Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose a novel SpMV kernel named PrSpMV to improve its performance by pursuing high predictability. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve pipeline efficiency by creating regular branch patterns to make branch prediction more accurate. Experiment results show that using the above optimization approaches, PrSpMV can eliminate nearly all branch mispredictions. Moreover, it can also significantly reduce the average L2 cache miss rate from 57% to 20% via efficiently leveraging hardware prefetchers. By using PrSpMV, stride prefetcher can be boosted with 1.31× speedup and dedicated irregular prefetcher can be improved with 1.40× speedup. Meanwhile, on commercial high-end Intel processors, it achieves 1.32× speedup against some state-of-the-art SpMV kernels.</description><subject>Branch Prediction</subject><subject>Data Prefetching</subject><subject>Sparse Matrix Computation</subject><issn>2576-6996</issn><isbn>9798350342918</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjMtKA0EQAEdBMMb8QQ7zA7v2TM-rvYU1xmDEgI9r2EcPjKxrmN2Lf29ED0VdihJiqaBUCuhmW1V3NgTlSw0aSwDw9kwsyFNAC2g0qXAuZtp6VzgidymuxvHjlAVUfiZgn1-OT--3cjXIdYypTTxMcp-5S-1UNz3LR84D9zJ-ZflbXouLWPcjL_49F2_369fqodg9b7bValckDWYqnGk6rDsTLDfGkKs1EtWeotUeLbaNc21oHTWmA3JkmDoNGMlEx4CBcS6Wf9_EzIdjTp91_j4oQHfC4g-EdEKB</recordid><startdate>20231106</startdate><enddate>20231106</enddate><creator>Fu, Gelin</creator><creator>Xia, Tian</creator><creator>Qu, Shaoru</creator><creator>Luo, Zhongpei</creator><creator>Li, Shuyu</creator><creator>Cheng, Pengyu</creator><creator>Guo, Runfan</creator><creator>Ding, Yitong</creator><creator>Ren, Pengju</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20231106</creationdate><title>PrSpMV: An Efficient Predictable Kernel for SpMV</title><author>Fu, Gelin ; Xia, Tian ; Qu, Shaoru ; Luo, Zhongpei ; Li, Shuyu ; Cheng, Pengyu ; Guo, Runfan ; Ding, Yitong ; Ren, Pengju</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i204t-64bd3ad485eb4496a2399a79f527353cb66c8c69b4d09694e9d203f94f6e038e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Branch Prediction</topic><topic>Data Prefetching</topic><topic>Sparse Matrix Computation</topic><toplevel>online_resources</toplevel><creatorcontrib>Fu, Gelin</creatorcontrib><creatorcontrib>Xia, Tian</creatorcontrib><creatorcontrib>Qu, Shaoru</creatorcontrib><creatorcontrib>Luo, Zhongpei</creatorcontrib><creatorcontrib>Li, Shuyu</creatorcontrib><creatorcontrib>Cheng, Pengyu</creatorcontrib><creatorcontrib>Guo, Runfan</creatorcontrib><creatorcontrib>Ding, Yitong</creatorcontrib><creatorcontrib>Ren, Pengju</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fu, Gelin</au><au>Xia, Tian</au><au>Qu, Shaoru</au><au>Luo, Zhongpei</au><au>Li, Shuyu</au><au>Cheng, Pengyu</au><au>Guo, Runfan</au><au>Ding, Yitong</au><au>Ren, Pengju</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>PrSpMV: An Efficient Predictable Kernel for SpMV</atitle><btitle>2023 IEEE 41st International Conference on Computer Design (ICCD)</btitle><stitle>ICCD</stitle><date>2023-11-06</date><risdate>2023</risdate><spage>448</spage><epage>456</epage><pages>448-456</pages><eissn>2576-6996</eissn><eisbn>9798350342918</eisbn><coden>IEEPAD</coden><abstract>Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose a novel SpMV kernel named PrSpMV to improve its performance by pursuing high predictability. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve pipeline efficiency by creating regular branch patterns to make branch prediction more accurate. Experiment results show that using the above optimization approaches, PrSpMV can eliminate nearly all branch mispredictions. Moreover, it can also significantly reduce the average L2 cache miss rate from 57% to 20% via efficiently leveraging hardware prefetchers. By using PrSpMV, stride prefetcher can be boosted with 1.31× speedup and dedicated irregular prefetcher can be improved with 1.40× speedup. Meanwhile, on commercial high-end Intel processors, it achieves 1.32× speedup against some state-of-the-art SpMV kernels.</abstract><pub>IEEE</pub><doi>10.1109/ICCD58817.2023.00075</doi><tpages>9</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier EISSN: 2576-6996
ispartof 2023 IEEE 41st International Conference on Computer Design (ICCD), 2023, p.448-456
issn 2576-6996
language eng
recordid cdi_ieee_primary_10361035
source IEEE Xplore All Conference Series
subjects Branch Prediction
Data Prefetching
Sparse Matrix Computation
title PrSpMV: An Efficient Predictable Kernel for SpMV
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T10%3A30%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=PrSpMV:%20An%20Efficient%20Predictable%20Kernel%20for%20SpMV&rft.btitle=2023%20IEEE%2041st%20International%20Conference%20on%20Computer%20Design%20(ICCD)&rft.au=Fu,%20Gelin&rft.date=2023-11-06&rft.spage=448&rft.epage=456&rft.pages=448-456&rft.eissn=2576-6996&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICCD58817.2023.00075&rft.eisbn=9798350342918&rft_dat=%3Cieee_CHZPO%3E10361035%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i204t-64bd3ad485eb4496a2399a79f527353cb66c8c69b4d09694e9d203f94f6e038e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10361035&rfr_iscdi=true