Loading…
PrSpMV: An Efficient Predictable Kernel for SpMV
Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns c...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 456 |
container_issue | |
container_start_page | 448 |
container_title | |
container_volume | |
creator | Fu, Gelin Xia, Tian Qu, Shaoru Luo, Zhongpei Li, Shuyu Cheng, Pengyu Guo, Runfan Ding, Yitong Ren, Pengju |
description | Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose a novel SpMV kernel named PrSpMV to improve its performance by pursuing high predictability. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve pipeline efficiency by creating regular branch patterns to make branch prediction more accurate. Experiment results show that using the above optimization approaches, PrSpMV can eliminate nearly all branch mispredictions. Moreover, it can also significantly reduce the average L2 cache miss rate from 57% to 20% via efficiently leveraging hardware prefetchers. By using PrSpMV, stride prefetcher can be boosted with 1.31× speedup and dedicated irregular prefetcher can be improved with 1.40× speedup. Meanwhile, on commercial high-end Intel processors, it achieves 1.32× speedup against some state-of-the-art SpMV kernels. |
doi_str_mv | 10.1109/ICCD58817.2023.00075 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_CHZPO</sourceid><recordid>TN_cdi_ieee_primary_10361035</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10361035</ieee_id><sourcerecordid>10361035</sourcerecordid><originalsourceid>FETCH-LOGICAL-i204t-64bd3ad485eb4496a2399a79f527353cb66c8c69b4d09694e9d203f94f6e038e3</originalsourceid><addsrcrecordid>eNotjMtKA0EQAEdBMMb8QQ7zA7v2TM-rvYU1xmDEgI9r2EcPjKxrmN2Lf29ED0VdihJiqaBUCuhmW1V3NgTlSw0aSwDw9kwsyFNAC2g0qXAuZtp6VzgidymuxvHjlAVUfiZgn1-OT--3cjXIdYypTTxMcp-5S-1UNz3LR84D9zJ-ZflbXouLWPcjL_49F2_369fqodg9b7bValckDWYqnGk6rDsTLDfGkKs1EtWeotUeLbaNc21oHTWmA3JkmDoNGMlEx4CBcS6Wf9_EzIdjTp91_j4oQHfC4g-EdEKB</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>PrSpMV: An Efficient Predictable Kernel for SpMV</title><source>IEEE Xplore All Conference Series</source><creator>Fu, Gelin ; Xia, Tian ; Qu, Shaoru ; Luo, Zhongpei ; Li, Shuyu ; Cheng, Pengyu ; Guo, Runfan ; Ding, Yitong ; Ren, Pengju</creator><creatorcontrib>Fu, Gelin ; Xia, Tian ; Qu, Shaoru ; Luo, Zhongpei ; Li, Shuyu ; Cheng, Pengyu ; Guo, Runfan ; Ding, Yitong ; Ren, Pengju</creatorcontrib><description>Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose a novel SpMV kernel named PrSpMV to improve its performance by pursuing high predictability. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve pipeline efficiency by creating regular branch patterns to make branch prediction more accurate. Experiment results show that using the above optimization approaches, PrSpMV can eliminate nearly all branch mispredictions. Moreover, it can also significantly reduce the average L2 cache miss rate from 57% to 20% via efficiently leveraging hardware prefetchers. By using PrSpMV, stride prefetcher can be boosted with 1.31× speedup and dedicated irregular prefetcher can be improved with 1.40× speedup. Meanwhile, on commercial high-end Intel processors, it achieves 1.32× speedup against some state-of-the-art SpMV kernels.</description><identifier>EISSN: 2576-6996</identifier><identifier>EISBN: 9798350342918</identifier><identifier>DOI: 10.1109/ICCD58817.2023.00075</identifier><identifier>CODEN: IEEPAD</identifier><language>eng</language><publisher>IEEE</publisher><subject>Branch Prediction ; Data Prefetching ; Sparse Matrix Computation</subject><ispartof>2023 IEEE 41st International Conference on Computer Design (ICCD), 2023, p.448-456</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10361035$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,27924,54554,54931</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10361035$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fu, Gelin</creatorcontrib><creatorcontrib>Xia, Tian</creatorcontrib><creatorcontrib>Qu, Shaoru</creatorcontrib><creatorcontrib>Luo, Zhongpei</creatorcontrib><creatorcontrib>Li, Shuyu</creatorcontrib><creatorcontrib>Cheng, Pengyu</creatorcontrib><creatorcontrib>Guo, Runfan</creatorcontrib><creatorcontrib>Ding, Yitong</creatorcontrib><creatorcontrib>Ren, Pengju</creatorcontrib><title>PrSpMV: An Efficient Predictable Kernel for SpMV</title><title>2023 IEEE 41st International Conference on Computer Design (ICCD)</title><addtitle>ICCD</addtitle><description>Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose a novel SpMV kernel named PrSpMV to improve its performance by pursuing high predictability. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve pipeline efficiency by creating regular branch patterns to make branch prediction more accurate. Experiment results show that using the above optimization approaches, PrSpMV can eliminate nearly all branch mispredictions. Moreover, it can also significantly reduce the average L2 cache miss rate from 57% to 20% via efficiently leveraging hardware prefetchers. By using PrSpMV, stride prefetcher can be boosted with 1.31× speedup and dedicated irregular prefetcher can be improved with 1.40× speedup. Meanwhile, on commercial high-end Intel processors, it achieves 1.32× speedup against some state-of-the-art SpMV kernels.</description><subject>Branch Prediction</subject><subject>Data Prefetching</subject><subject>Sparse Matrix Computation</subject><issn>2576-6996</issn><isbn>9798350342918</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2023</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNotjMtKA0EQAEdBMMb8QQ7zA7v2TM-rvYU1xmDEgI9r2EcPjKxrmN2Lf29ED0VdihJiqaBUCuhmW1V3NgTlSw0aSwDw9kwsyFNAC2g0qXAuZtp6VzgidymuxvHjlAVUfiZgn1-OT--3cjXIdYypTTxMcp-5S-1UNz3LR84D9zJ-ZflbXouLWPcjL_49F2_369fqodg9b7bValckDWYqnGk6rDsTLDfGkKs1EtWeotUeLbaNc21oHTWmA3JkmDoNGMlEx4CBcS6Wf9_EzIdjTp91_j4oQHfC4g-EdEKB</recordid><startdate>20231106</startdate><enddate>20231106</enddate><creator>Fu, Gelin</creator><creator>Xia, Tian</creator><creator>Qu, Shaoru</creator><creator>Luo, Zhongpei</creator><creator>Li, Shuyu</creator><creator>Cheng, Pengyu</creator><creator>Guo, Runfan</creator><creator>Ding, Yitong</creator><creator>Ren, Pengju</creator><general>IEEE</general><scope>6IE</scope><scope>6IH</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIO</scope></search><sort><creationdate>20231106</creationdate><title>PrSpMV: An Efficient Predictable Kernel for SpMV</title><author>Fu, Gelin ; Xia, Tian ; Qu, Shaoru ; Luo, Zhongpei ; Li, Shuyu ; Cheng, Pengyu ; Guo, Runfan ; Ding, Yitong ; Ren, Pengju</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i204t-64bd3ad485eb4496a2399a79f527353cb66c8c69b4d09694e9d203f94f6e038e3</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Branch Prediction</topic><topic>Data Prefetching</topic><topic>Sparse Matrix Computation</topic><toplevel>online_resources</toplevel><creatorcontrib>Fu, Gelin</creatorcontrib><creatorcontrib>Xia, Tian</creatorcontrib><creatorcontrib>Qu, Shaoru</creatorcontrib><creatorcontrib>Luo, Zhongpei</creatorcontrib><creatorcontrib>Li, Shuyu</creatorcontrib><creatorcontrib>Cheng, Pengyu</creatorcontrib><creatorcontrib>Guo, Runfan</creatorcontrib><creatorcontrib>Ding, Yitong</creatorcontrib><creatorcontrib>Ren, Pengju</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan (POP) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP) 1998-present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fu, Gelin</au><au>Xia, Tian</au><au>Qu, Shaoru</au><au>Luo, Zhongpei</au><au>Li, Shuyu</au><au>Cheng, Pengyu</au><au>Guo, Runfan</au><au>Ding, Yitong</au><au>Ren, Pengju</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>PrSpMV: An Efficient Predictable Kernel for SpMV</atitle><btitle>2023 IEEE 41st International Conference on Computer Design (ICCD)</btitle><stitle>ICCD</stitle><date>2023-11-06</date><risdate>2023</risdate><spage>448</spage><epage>456</epage><pages>448-456</pages><eissn>2576-6996</eissn><eisbn>9798350342918</eisbn><coden>IEEPAD</coden><abstract>Sparse Matrix-Vector Multiplication (SpMV) has been widely applied in scientific computation, industry simulation, and intelligent computation domains, which is the critical algorithm in all these applications. Due to the poor data locality, low cache usage, and extremely irregular branch patterns caused by the highly sparse and random distributions, SpMV optimization has become one of the most challenging problems for modern high-performance processors. In this paper, we study the bottlenecks of SpMV on current out-of-order CPUs and propose a novel SpMV kernel named PrSpMV to improve its performance by pursuing high predictability. Specifically, we improve the memory access regularity and locality by creating serialized access patterns so that the data prefetching efficiency and cache usage are optimized. We also improve pipeline efficiency by creating regular branch patterns to make branch prediction more accurate. Experiment results show that using the above optimization approaches, PrSpMV can eliminate nearly all branch mispredictions. Moreover, it can also significantly reduce the average L2 cache miss rate from 57% to 20% via efficiently leveraging hardware prefetchers. By using PrSpMV, stride prefetcher can be boosted with 1.31× speedup and dedicated irregular prefetcher can be improved with 1.40× speedup. Meanwhile, on commercial high-end Intel processors, it achieves 1.32× speedup against some state-of-the-art SpMV kernels.</abstract><pub>IEEE</pub><doi>10.1109/ICCD58817.2023.00075</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | EISSN: 2576-6996 |
ispartof | 2023 IEEE 41st International Conference on Computer Design (ICCD), 2023, p.448-456 |
issn | 2576-6996 |
language | eng |
recordid | cdi_ieee_primary_10361035 |
source | IEEE Xplore All Conference Series |
subjects | Branch Prediction Data Prefetching Sparse Matrix Computation |
title | PrSpMV: An Efficient Predictable Kernel for SpMV |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T10%3A30%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_CHZPO&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=PrSpMV:%20An%20Efficient%20Predictable%20Kernel%20for%20SpMV&rft.btitle=2023%20IEEE%2041st%20International%20Conference%20on%20Computer%20Design%20(ICCD)&rft.au=Fu,%20Gelin&rft.date=2023-11-06&rft.spage=448&rft.epage=456&rft.pages=448-456&rft.eissn=2576-6996&rft.coden=IEEPAD&rft_id=info:doi/10.1109/ICCD58817.2023.00075&rft.eisbn=9798350342918&rft_dat=%3Cieee_CHZPO%3E10361035%3C/ieee_CHZPO%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i204t-64bd3ad485eb4496a2399a79f527353cb66c8c69b4d09694e9d203f94f6e038e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=10361035&rfr_iscdi=true |