Loading…
Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors
Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consistin...
Saved in:
Published in: | The international journal of high performance computing applications 2024-03, Vol.38 (2), p.55-68 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093 |
---|---|
cites | cdi_FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093 |
container_end_page | 68 |
container_issue | 2 |
container_start_page | 55 |
container_title | The international journal of high performance computing applications |
container_volume | 38 |
creator | Rodríguez-Sánchez, Rafael Castelló, Adrián Catalán, Sandra Igual, Francisco D. Quintana-Ortí, Enrique S. |
description | Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases. |
doi_str_mv | 10.1177/10943420231157653 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2955225722</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_10943420231157653</sage_id><sourcerecordid>2955225722</sourcerecordid><originalsourceid>FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093</originalsourceid><addsrcrecordid>eNp1UDtPwzAQthBIlMIPYLPEnOK3m7FUpSBVYgDmyE4uxSVxgp0I-Pe4KogBMd3pvpfuQ-iSkhmlWl9TkgsuGGGcUqmV5EdoQrWgGZsLdZz2hGd7wik6i3FHCFGCywnyq48eggNfQsTvbnjBHuIAFe5NME0DjYstdh4PJr5mPzds-r5xpRlc5yMeo_Nb3O4RYxvAN5vFI-48bsdmcGUXAPehS_axC_EcndSmiXDxPafo-Xb1tLzLNg_r--Vik5VcyiFTVpYABiwophmv1ZyXUBmijbCVraQgVjIhuTC1gFxzVWurOBFWU2sIyfkUXR18U_TbmD4qdt0YfIosWC4lY1Izllj0wCpDF2OAuuiDa034LCgp9rUWf2pNmtlBE80Wfl3_F3wBG2x5Rw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2955225722</pqid></control><display><type>article</type><title>Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors</title><source>SAGE</source><creator>Rodríguez-Sánchez, Rafael ; Castelló, Adrián ; Catalán, Sandra ; Igual, Francisco D. ; Quintana-Ortí, Enrique S.</creator><creatorcontrib>Rodríguez-Sánchez, Rafael ; Castelló, Adrián ; Catalán, Sandra ; Igual, Francisco D. ; Quintana-Ortí, Enrique S.</creatorcontrib><description>Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/10943420231157653</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Linear algebra ; Microprocessors ; Parallel processing ; Processors</subject><ispartof>The international journal of high performance computing applications, 2024-03, Vol.38 (2), p.55-68</ispartof><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093</citedby><cites>FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093</cites><orcidid>0000-0002-5454-165X ; 0000-0001-8789-3953</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904,79110</link.rule.ids></links><search><creatorcontrib>Rodríguez-Sánchez, Rafael</creatorcontrib><creatorcontrib>Castelló, Adrián</creatorcontrib><creatorcontrib>Catalán, Sandra</creatorcontrib><creatorcontrib>Igual, Francisco D.</creatorcontrib><creatorcontrib>Quintana-Ortí, Enrique S.</creatorcontrib><title>Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors</title><title>The international journal of high performance computing applications</title><description>Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.</description><subject>Linear algebra</subject><subject>Microprocessors</subject><subject>Parallel processing</subject><subject>Processors</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>AFRWT</sourceid><recordid>eNp1UDtPwzAQthBIlMIPYLPEnOK3m7FUpSBVYgDmyE4uxSVxgp0I-Pe4KogBMd3pvpfuQ-iSkhmlWl9TkgsuGGGcUqmV5EdoQrWgGZsLdZz2hGd7wik6i3FHCFGCywnyq48eggNfQsTvbnjBHuIAFe5NME0DjYstdh4PJr5mPzds-r5xpRlc5yMeo_Nb3O4RYxvAN5vFI-48bsdmcGUXAPehS_axC_EcndSmiXDxPafo-Xb1tLzLNg_r--Vik5VcyiFTVpYABiwophmv1ZyXUBmijbCVraQgVjIhuTC1gFxzVWurOBFWU2sIyfkUXR18U_TbmD4qdt0YfIosWC4lY1Izllj0wCpDF2OAuuiDa034LCgp9rUWf2pNmtlBE80Wfl3_F3wBG2x5Rw</recordid><startdate>202403</startdate><enddate>202403</enddate><creator>Rodríguez-Sánchez, Rafael</creator><creator>Castelló, Adrián</creator><creator>Catalán, Sandra</creator><creator>Igual, Francisco D.</creator><creator>Quintana-Ortí, Enrique S.</creator><general>SAGE Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>AFRWT</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5454-165X</orcidid><orcidid>https://orcid.org/0000-0001-8789-3953</orcidid></search><sort><creationdate>202403</creationdate><title>Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors</title><author>Rodríguez-Sánchez, Rafael ; Castelló, Adrián ; Catalán, Sandra ; Igual, Francisco D. ; Quintana-Ortí, Enrique S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Linear algebra</topic><topic>Microprocessors</topic><topic>Parallel processing</topic><topic>Processors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rodríguez-Sánchez, Rafael</creatorcontrib><creatorcontrib>Castelló, Adrián</creatorcontrib><creatorcontrib>Catalán, Sandra</creatorcontrib><creatorcontrib>Igual, Francisco D.</creatorcontrib><creatorcontrib>Quintana-Ortí, Enrique S.</creatorcontrib><collection>SAGE Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rodríguez-Sánchez, Rafael</au><au>Castelló, Adrián</au><au>Catalán, Sandra</au><au>Igual, Francisco D.</au><au>Quintana-Ortí, Enrique S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2024-03</date><risdate>2024</risdate><volume>38</volume><issue>2</issue><spage>55</spage><epage>68</epage><pages>55-68</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/10943420231157653</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-5454-165X</orcidid><orcidid>https://orcid.org/0000-0001-8789-3953</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1094-3420 |
ispartof | The international journal of high performance computing applications, 2024-03, Vol.38 (2), p.55-68 |
issn | 1094-3420 1741-2846 |
language | eng |
recordid | cdi_proquest_journals_2955225722 |
source | SAGE |
subjects | Linear algebra Microprocessors Parallel processing Processors |
title | Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T06%3A36%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Experiences%20with%20nested%20parallelism%20in%20task-parallel%20applications%20using%20malleable%20BLAS%20on%20multicore%20processors&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Rodr%C3%ADguez-S%C3%A1nchez,%20Rafael&rft.date=2024-03&rft.volume=38&rft.issue=2&rft.spage=55&rft.epage=68&rft.pages=55-68&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/10943420231157653&rft_dat=%3Cproquest_cross%3E2955225722%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2955225722&rft_id=info:pmid/&rft_sage_id=10.1177_10943420231157653&rfr_iscdi=true |