Loading…

Algorithms for scalable synchronization on shared-memory multiprocessors

Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become marke...

Full description

Saved in:

Bibliographic Details
Published in:	ACM transactions on computer systems 1991-02, Vol.9 (1), p.21-65
Main Authors:	Mellor-Crummey, John M., Scott, Michael L.
Format:	Article
Language:	English
Subjects:	Applied sciences Architectures Computer science control theory systems Computer systems and distributed systems. User interface Computer systems organization Contextual software domains Cross-computing tools and techniques Design Dynamic memory Exact sciences and technology General and reference Hardware Hierarchical storage management Information storage systems Information systems Integrated circuits Interconnection architectures Measurement Mutual exclusion Operating systems Parallel architectures Process management Process synchronization Semiconductor memory Software Software and its engineering Software organization and properties Storage management
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-a420t-80da0d8d1a60aeb5a96b05976be4dab2b6ed59c485f328fb36f310a143e040863
cites	cdi_FETCH-LOGICAL-a420t-80da0d8d1a60aeb5a96b05976be4dab2b6ed59c485f328fb36f310a143e040863
container_end_page	65
container_issue	1
container_start_page	21
container_title	ACM transactions on computer systems
container_volume	9
creator	Mellor-Crummey, John M. Scott, Michael L.
description	Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes. We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that contention due to synchronization need not be a problem in large-scale shared-memory multiprocessors. The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —From the Authors' Abstract
doi_str_mv	10.1145/103727.103729
format	article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_28944210</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>28944210</sourcerecordid><originalsourceid>FETCH-LOGICAL-a420t-80da0d8d1a60aeb5a96b05976be4dab2b6ed59c485f328fb36f310a143e040863</originalsourceid><addsrcrecordid>eNpNkE1Lw0AQhhdRsFaPHrzlIN5SZz_yscdS1AoFL3oOk83GriTZupMc6q83bYoIAy8Mz7wDD2O3HBacq-SRg8xEtjiGPmMzniRZnEkpz9kMMqliARm_ZFdEXwAw7sWMrZfNpw-u37YU1T5EZLDBsrER7TuzDb5zP9g730Xj0BaDreLWtj7so3ZoercL3lgiH-iaXdTYkL055Zx9PD-9r9bx5u3ldbXcxKgE9HEOFUKVVxxTQFsmqNMSEp2lpVUVlqJMbZVoo_KkliKvS5nWkgNyJS0oyFM5Zw9T7_j6e7DUF60jY5sGO-sHKkSulRKjgzmLJ9AETxRsXeyCazHsCw7FwVcx-ZpCj_z9qRgPEuqAnXH0d6S0zFN9wO4mDE37r_FY8QvNmXLy</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>28944210</pqid></control><display><type>article</type><title>Algorithms for scalable synchronization on shared-memory multiprocessors</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><creator>Mellor-Crummey, John M. ; Scott, Michael L.</creator><creatorcontrib>Mellor-Crummey, John M. ; Scott, Michael L.</creatorcontrib><description>Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes. We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that contention due to synchronization need not be a problem in large-scale shared-memory multiprocessors. The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —From the Authors' Abstract</description><identifier>ISSN: 0734-2071</identifier><identifier>EISSN: 1557-7333</identifier><identifier>DOI: 10.1145/103727.103729</identifier><identifier>CODEN: ACSYEC</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Applied sciences ; Architectures ; Computer science; control theory; systems ; Computer systems and distributed systems. User interface ; Computer systems organization ; Contextual software domains ; Cross-computing tools and techniques ; Design ; Dynamic memory ; Exact sciences and technology ; General and reference ; Hardware ; Hierarchical storage management ; Information storage systems ; Information systems ; Integrated circuits ; Interconnection architectures ; Measurement ; Mutual exclusion ; Operating systems ; Parallel architectures ; Process management ; Process synchronization ; Semiconductor memory ; Software ; Software and its engineering ; Software organization and properties ; Storage management</subject><ispartof>ACM transactions on computer systems, 1991-02, Vol.9 (1), p.21-65</ispartof><rights>ACM</rights><rights>1992 INIST-CNRS</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a420t-80da0d8d1a60aeb5a96b05976be4dab2b6ed59c485f328fb36f310a143e040863</citedby><cites>FETCH-LOGICAL-a420t-80da0d8d1a60aeb5a96b05976be4dab2b6ed59c485f328fb36f310a143e040863</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=4938699$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Mellor-Crummey, John M.</creatorcontrib><creatorcontrib>Scott, Michael L.</creatorcontrib><title>Algorithms for scalable synchronization on shared-memory multiprocessors</title><title>ACM transactions on computer systems</title><addtitle>ACM TOCS</addtitle><description>Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes. We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that contention due to synchronization need not be a problem in large-scale shared-memory multiprocessors. The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —From the Authors' Abstract</description><subject>Applied sciences</subject><subject>Architectures</subject><subject>Computer science; control theory; systems</subject><subject>Computer systems and distributed systems. User interface</subject><subject>Computer systems organization</subject><subject>Contextual software domains</subject><subject>Cross-computing tools and techniques</subject><subject>Design</subject><subject>Dynamic memory</subject><subject>Exact sciences and technology</subject><subject>General and reference</subject><subject>Hardware</subject><subject>Hierarchical storage management</subject><subject>Information storage systems</subject><subject>Information systems</subject><subject>Integrated circuits</subject><subject>Interconnection architectures</subject><subject>Measurement</subject><subject>Mutual exclusion</subject><subject>Operating systems</subject><subject>Parallel architectures</subject><subject>Process management</subject><subject>Process synchronization</subject><subject>Semiconductor memory</subject><subject>Software</subject><subject>Software and its engineering</subject><subject>Software organization and properties</subject><subject>Storage management</subject><issn>0734-2071</issn><issn>1557-7333</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1991</creationdate><recordtype>article</recordtype><recordid>eNpNkE1Lw0AQhhdRsFaPHrzlIN5SZz_yscdS1AoFL3oOk83GriTZupMc6q83bYoIAy8Mz7wDD2O3HBacq-SRg8xEtjiGPmMzniRZnEkpz9kMMqliARm_ZFdEXwAw7sWMrZfNpw-u37YU1T5EZLDBsrER7TuzDb5zP9g730Xj0BaDreLWtj7so3ZoercL3lgiH-iaXdTYkL055Zx9PD-9r9bx5u3ldbXcxKgE9HEOFUKVVxxTQFsmqNMSEp2lpVUVlqJMbZVoo_KkliKvS5nWkgNyJS0oyFM5Zw9T7_j6e7DUF60jY5sGO-sHKkSulRKjgzmLJ9AETxRsXeyCazHsCw7FwVcx-ZpCj_z9qRgPEuqAnXH0d6S0zFN9wO4mDE37r_FY8QvNmXLy</recordid><startdate>19910201</startdate><enddate>19910201</enddate><creator>Mellor-Crummey, John M.</creator><creator>Scott, Michael L.</creator><general>ACM</general><general>Association for Computing Machinery</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>19910201</creationdate><title>Algorithms for scalable synchronization on shared-memory multiprocessors</title><author>Mellor-Crummey, John M. ; Scott, Michael L.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a420t-80da0d8d1a60aeb5a96b05976be4dab2b6ed59c485f328fb36f310a143e040863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1991</creationdate><topic>Applied sciences</topic><topic>Architectures</topic><topic>Computer science; control theory; systems</topic><topic>Computer systems and distributed systems. User interface</topic><topic>Computer systems organization</topic><topic>Contextual software domains</topic><topic>Cross-computing tools and techniques</topic><topic>Design</topic><topic>Dynamic memory</topic><topic>Exact sciences and technology</topic><topic>General and reference</topic><topic>Hardware</topic><topic>Hierarchical storage management</topic><topic>Information storage systems</topic><topic>Information systems</topic><topic>Integrated circuits</topic><topic>Interconnection architectures</topic><topic>Measurement</topic><topic>Mutual exclusion</topic><topic>Operating systems</topic><topic>Parallel architectures</topic><topic>Process management</topic><topic>Process synchronization</topic><topic>Semiconductor memory</topic><topic>Software</topic><topic>Software and its engineering</topic><topic>Software organization and properties</topic><topic>Storage management</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mellor-Crummey, John M.</creatorcontrib><creatorcontrib>Scott, Michael L.</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>ACM transactions on computer systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mellor-Crummey, John M.</au><au>Scott, Michael L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Algorithms for scalable synchronization on shared-memory multiprocessors</atitle><jtitle>ACM transactions on computer systems</jtitle><stitle>ACM TOCS</stitle><date>1991-02-01</date><risdate>1991</risdate><volume>9</volume><issue>1</issue><spage>21</spage><epage>65</epage><pages>21-65</pages><issn>0734-2071</issn><eissn>1557-7333</eissn><coden>ACSYEC</coden><abstract>Busy-wait techniques are heavily used for mutual exclusion and barrier synchronization in shared-memory parallel programs. Unfortunately, typical implementations of busy-waiting tend to produce large amounts of memory and interconnect contention, introducing performance bottlenecks that become markedly more pronounced as applications scale. We argue that this problem is not fundamental, and that one can in fact construct busy-wait synchronization algorithms that induce no memory or interconnect contention. The key to these algorithms is for every processor to spin on separate locally-accessible flag variables, and for some other processor to terminate the spin with a single remote write operation at an appropriate time. Flag variables may be locally-accessible as a result of coherent caching, or by virtue of allocation in the local portion of physically distributed shared memory. We present a new scalable algorithm for spin locks that generates 0(1) remote references per lock acquisition, independent of the number of processors attempting to acquire the lock. Our algorithm provides reasonable latency in the absence of contention, requires only a constant amount of space per lock, and requires no hardware support other than a swap-with-memory instruction. We also present a new scalable barrier algorithm that generates 0(1) remote references per processor reaching the barrier, and observe that two previously-known barriers can likewise be cast in a form that spins only on locally-accessible flag variables. None of these barrier algorithms requires hardware support beyond the usual atomicity of memory reads and writes. We compare the performance of our scalable algorithms with other software approaches to busy-wait synchronization on both a Sequent Symmetry and a BBN Butterfly. Our principal conclusion is that contention due to synchronization need not be a problem in large-scale shared-memory multiprocessors. The existence of scalable algorithms greatly weakens the case for costly special-purpose hardware support for synchronization, and provides a case against so-called “dance hall” architectures, in which shared memory locations are equally far from all processors. —From the Authors' Abstract</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/103727.103729</doi><tpages>45</tpages><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 0734-2071
ispartof	ACM transactions on computer systems, 1991-02, Vol.9 (1), p.21-65
issn	0734-2071 1557-7333
language	eng
recordid	cdi_proquest_miscellaneous_28944210
source	Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)
subjects	Applied sciences Architectures Computer science control theory systems Computer systems and distributed systems. User interface Computer systems organization Contextual software domains Cross-computing tools and techniques Design Dynamic memory Exact sciences and technology General and reference Hardware Hierarchical storage management Information storage systems Information systems Integrated circuits Interconnection architectures Measurement Mutual exclusion Operating systems Parallel architectures Process management Process synchronization Semiconductor memory Software Software and its engineering Software organization and properties Storage management
title	Algorithms for scalable synchronization on shared-memory multiprocessors
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T14%3A41%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Algorithms%20for%20scalable%20synchronization%20on%20shared-memory%20multiprocessors&rft.jtitle=ACM%20transactions%20on%20computer%20systems&rft.au=Mellor-Crummey,%20John%20M.&rft.date=1991-02-01&rft.volume=9&rft.issue=1&rft.spage=21&rft.epage=65&rft.pages=21-65&rft.issn=0734-2071&rft.eissn=1557-7333&rft.coden=ACSYEC&rft_id=info:doi/10.1145/103727.103729&rft_dat=%3Cproquest_cross%3E28944210%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a420t-80da0d8d1a60aeb5a96b05976be4dab2b6ed59c485f328fb36f310a143e040863%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=28944210&rft_id=info:pmid/&rfr_iscdi=true