Loading…

Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors

Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consistin...

Full description

Saved in:
Bibliographic Details
Published in:The international journal of high performance computing applications 2024-03, Vol.38 (2), p.55-68
Main Authors: Rodríguez-Sánchez, Rafael, Castelló, Adrián, Catalán, Sandra, Igual, Francisco D., Quintana-Ortí, Enrique S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093
cites cdi_FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093
container_end_page 68
container_issue 2
container_start_page 55
container_title The international journal of high performance computing applications
container_volume 38
creator Rodríguez-Sánchez, Rafael
Castelló, Adrián
Catalán, Sandra
Igual, Francisco D.
Quintana-Ortí, Enrique S.
description Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.
doi_str_mv 10.1177/10943420231157653
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2955225722</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sage_id>10.1177_10943420231157653</sage_id><sourcerecordid>2955225722</sourcerecordid><originalsourceid>FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093</originalsourceid><addsrcrecordid>eNp1UDtPwzAQthBIlMIPYLPEnOK3m7FUpSBVYgDmyE4uxSVxgp0I-Pe4KogBMd3pvpfuQ-iSkhmlWl9TkgsuGGGcUqmV5EdoQrWgGZsLdZz2hGd7wik6i3FHCFGCywnyq48eggNfQsTvbnjBHuIAFe5NME0DjYstdh4PJr5mPzds-r5xpRlc5yMeo_Nb3O4RYxvAN5vFI-48bsdmcGUXAPehS_axC_EcndSmiXDxPafo-Xb1tLzLNg_r--Vik5VcyiFTVpYABiwophmv1ZyXUBmijbCVraQgVjIhuTC1gFxzVWurOBFWU2sIyfkUXR18U_TbmD4qdt0YfIosWC4lY1Izllj0wCpDF2OAuuiDa034LCgp9rUWf2pNmtlBE80Wfl3_F3wBG2x5Rw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2955225722</pqid></control><display><type>article</type><title>Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors</title><source>SAGE</source><creator>Rodríguez-Sánchez, Rafael ; Castelló, Adrián ; Catalán, Sandra ; Igual, Francisco D. ; Quintana-Ortí, Enrique S.</creator><creatorcontrib>Rodríguez-Sánchez, Rafael ; Castelló, Adrián ; Catalán, Sandra ; Igual, Francisco D. ; Quintana-Ortí, Enrique S.</creatorcontrib><description>Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.</description><identifier>ISSN: 1094-3420</identifier><identifier>EISSN: 1741-2846</identifier><identifier>DOI: 10.1177/10943420231157653</identifier><language>eng</language><publisher>London, England: SAGE Publications</publisher><subject>Linear algebra ; Microprocessors ; Parallel processing ; Processors</subject><ispartof>The international journal of high performance computing applications, 2024-03, Vol.38 (2), p.55-68</ispartof><rights>The Author(s) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093</citedby><cites>FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093</cites><orcidid>0000-0002-5454-165X ; 0000-0001-8789-3953</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,776,780,27903,27904,79110</link.rule.ids></links><search><creatorcontrib>Rodríguez-Sánchez, Rafael</creatorcontrib><creatorcontrib>Castelló, Adrián</creatorcontrib><creatorcontrib>Catalán, Sandra</creatorcontrib><creatorcontrib>Igual, Francisco D.</creatorcontrib><creatorcontrib>Quintana-Ortí, Enrique S.</creatorcontrib><title>Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors</title><title>The international journal of high performance computing applications</title><description>Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.</description><subject>Linear algebra</subject><subject>Microprocessors</subject><subject>Parallel processing</subject><subject>Processors</subject><issn>1094-3420</issn><issn>1741-2846</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>AFRWT</sourceid><recordid>eNp1UDtPwzAQthBIlMIPYLPEnOK3m7FUpSBVYgDmyE4uxSVxgp0I-Pe4KogBMd3pvpfuQ-iSkhmlWl9TkgsuGGGcUqmV5EdoQrWgGZsLdZz2hGd7wik6i3FHCFGCywnyq48eggNfQsTvbnjBHuIAFe5NME0DjYstdh4PJr5mPzds-r5xpRlc5yMeo_Nb3O4RYxvAN5vFI-48bsdmcGUXAPehS_axC_EcndSmiXDxPafo-Xb1tLzLNg_r--Vik5VcyiFTVpYABiwophmv1ZyXUBmijbCVraQgVjIhuTC1gFxzVWurOBFWU2sIyfkUXR18U_TbmD4qdt0YfIosWC4lY1Izllj0wCpDF2OAuuiDa034LCgp9rUWf2pNmtlBE80Wfl3_F3wBG2x5Rw</recordid><startdate>202403</startdate><enddate>202403</enddate><creator>Rodríguez-Sánchez, Rafael</creator><creator>Castelló, Adrián</creator><creator>Catalán, Sandra</creator><creator>Igual, Francisco D.</creator><creator>Quintana-Ortí, Enrique S.</creator><general>SAGE Publications</general><general>SAGE PUBLICATIONS, INC</general><scope>AFRWT</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-5454-165X</orcidid><orcidid>https://orcid.org/0000-0001-8789-3953</orcidid></search><sort><creationdate>202403</creationdate><title>Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors</title><author>Rodríguez-Sánchez, Rafael ; Castelló, Adrián ; Catalán, Sandra ; Igual, Francisco D. ; Quintana-Ortí, Enrique S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Linear algebra</topic><topic>Microprocessors</topic><topic>Parallel processing</topic><topic>Processors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rodríguez-Sánchez, Rafael</creatorcontrib><creatorcontrib>Castelló, Adrián</creatorcontrib><creatorcontrib>Catalán, Sandra</creatorcontrib><creatorcontrib>Igual, Francisco D.</creatorcontrib><creatorcontrib>Quintana-Ortí, Enrique S.</creatorcontrib><collection>SAGE Open Access</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>The international journal of high performance computing applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rodríguez-Sánchez, Rafael</au><au>Castelló, Adrián</au><au>Catalán, Sandra</au><au>Igual, Francisco D.</au><au>Quintana-Ortí, Enrique S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors</atitle><jtitle>The international journal of high performance computing applications</jtitle><date>2024-03</date><risdate>2024</risdate><volume>38</volume><issue>2</issue><spage>55</spage><epage>68</epage><pages>55-68</pages><issn>1094-3420</issn><eissn>1741-2846</eissn><abstract>Malleability is defined as the ability to vary the degree of parallelism at runtime, and is regarded as a means to improve core occupation on state-of-the-art multicore processors tshat contain tens of computational cores per socket. This property is especially interesting for applications consisting of irregular workloads and/or divergent executions paths. The integration of malleability in high-performance instances of the Basic Linear Algebra Subprograms (BLAS) is currently nonexistent, and, in consequence, applications relying on these computational kernels cannot benefit from this capability. In response to this scenario, in this paper we demonstrate that significant performance benefits can be gathered via the exploitation of malleability in a framework designed to implement portable and high-performance BLAS-like operations. For this purpose, we integrate malleability within the BLIS library, and provide an experimental evaluation of the result on three different practical use cases.</abstract><cop>London, England</cop><pub>SAGE Publications</pub><doi>10.1177/10943420231157653</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-5454-165X</orcidid><orcidid>https://orcid.org/0000-0001-8789-3953</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1094-3420
ispartof The international journal of high performance computing applications, 2024-03, Vol.38 (2), p.55-68
issn 1094-3420
1741-2846
language eng
recordid cdi_proquest_journals_2955225722
source SAGE
subjects Linear algebra
Microprocessors
Parallel processing
Processors
title Experiences with nested parallelism in task-parallel applications using malleable BLAS on multicore processors
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T06%3A36%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Experiences%20with%20nested%20parallelism%20in%20task-parallel%20applications%20using%20malleable%20BLAS%20on%20multicore%20processors&rft.jtitle=The%20international%20journal%20of%20high%20performance%20computing%20applications&rft.au=Rodr%C3%ADguez-S%C3%A1nchez,%20Rafael&rft.date=2024-03&rft.volume=38&rft.issue=2&rft.spage=55&rft.epage=68&rft.pages=55-68&rft.issn=1094-3420&rft.eissn=1741-2846&rft_id=info:doi/10.1177/10943420231157653&rft_dat=%3Cproquest_cross%3E2955225722%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c355t-6b5ceeaebe62723f683ceda07a4bdbd540b524534af4e9736f7b6304b71ba0093%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2955225722&rft_id=info:pmid/&rft_sage_id=10.1177_10943420231157653&rfr_iscdi=true