Loading…

Tuning compiler optimizations for simultaneous multithreading

Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost, inter-processor c...

Full description

Saved in:

Bibliographic Details
Main Authors:	Lo, Jack L., Eggers, Susan J., Levy, Henry M., Parekh, Sujay S., Tullsen, Dean M.
Format:	Conference Proceeding
Language:	English
Subjects:	Hardware > Electronic design automation > Logic synthesis > Circuit optimization Hardware > Emerging technologies > Analysis and design of emerging devices and systems > Emerging languages and compilers Software and its engineering > Software notations and tools > Compilers
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by
cites
container_end_page	124
container_issue
container_start_page	114
container_title
container_volume
creator	Lo, Jack L. Eggers, Susan J. Levy, Henry M. Parekh, Sujay S. Tullsen, Dean M.
description	Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost, inter-processor communication. This paper reexamines several compiler optimizations in the context of simultaneous multithreading (SMT), a processor architecture that issues instructions from multiple threads to the functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of processor and memory system resources; unlike current multiprocessors, SMT exposes and benefits from inter-thread instruction-level parallelism when hiding latencies. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT. We revisit three optimizations in this light: loop-iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyclic, rather than a blocked algorithm; non-loop programs should not be software speculated and compilers no longer need to be concerned about precisely sizing tiles to match cache sizes. By following these new guidelines compilers can generate code that improves the performance of programs executing on SMT machines.
doi_str_mv	10.5555/266800.266812
format	conference_proceeding
fullrecord	<record><control><sourceid>acm</sourceid><recordid>TN_cdi_acm_books_10_5555_266800_266812</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>acm_books_10_5555_266800_266812</sourcerecordid><originalsourceid>FETCH-acm_books_10_5555_266800_2668123</originalsourceid><addsrcrecordid>eNqVjr0OgjAYRZsYE_8Y3Ts5KbQgUAcno_EB2JuCRT-lLWnL4tMLwQfQu9w7nJschNaUhGmfKM4yRkg4FI0naEEYZVl-yHM2Q4FzT9Jnn1KaxHN0LDoN-o4ro1popMWm9aDgLTwY7XBtLHagusYLLU3n8DDBP6wUt_62QtNaNE4G316izeVcnK47USleGvNynBI-SPFRio9SyR_g9ieQlxZknXwA13FJZg</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Tuning compiler optimizations for simultaneous multithreading</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Lo, Jack L. ; Eggers, Susan J. ; Levy, Henry M. ; Parekh, Sujay S. ; Tullsen, Dean M.</creator><creatorcontrib>Lo, Jack L. ; Eggers, Susan J. ; Levy, Henry M. ; Parekh, Sujay S. ; Tullsen, Dean M.</creatorcontrib><description>Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost, inter-processor communication. This paper reexamines several compiler optimizations in the context of simultaneous multithreading (SMT), a processor architecture that issues instructions from multiple threads to the functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of processor and memory system resources; unlike current multiprocessors, SMT exposes and benefits from inter-thread instruction-level parallelism when hiding latencies. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT. We revisit three optimizations in this light: loop-iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyclic, rather than a blocked algorithm; non-loop programs should not be software speculated and compilers no longer need to be concerned about precisely sizing tiles to match cache sizes. By following these new guidelines compilers can generate code that improves the performance of programs executing on SMT machines.</description><identifier>ISBN: 0818679778</identifier><identifier>ISBN: 9780818679773</identifier><identifier>DOI: 10.5555/266800.266812</identifier><language>eng</language><publisher>Washington, DC, USA: IEEE Computer Society</publisher><subject>Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization ; Hardware -- Emerging technologies -- Analysis and design of emerging devices and systems -- Emerging languages and compilers ; Software and its engineering -- Software notations and tools -- Compilers</subject><ispartof>Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, 1997, p.114-124</ispartof><rights>Copyright (c) 1997 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,776,780,785,786,27904</link.rule.ids></links><search><creatorcontrib>Lo, Jack L.</creatorcontrib><creatorcontrib>Eggers, Susan J.</creatorcontrib><creatorcontrib>Levy, Henry M.</creatorcontrib><creatorcontrib>Parekh, Sujay S.</creatorcontrib><creatorcontrib>Tullsen, Dean M.</creatorcontrib><title>Tuning compiler optimizations for simultaneous multithreading</title><title>Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture</title><description>Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost, inter-processor communication. This paper reexamines several compiler optimizations in the context of simultaneous multithreading (SMT), a processor architecture that issues instructions from multiple threads to the functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of processor and memory system resources; unlike current multiprocessors, SMT exposes and benefits from inter-thread instruction-level parallelism when hiding latencies. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT. We revisit three optimizations in this light: loop-iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyclic, rather than a blocked algorithm; non-loop programs should not be software speculated and compilers no longer need to be concerned about precisely sizing tiles to match cache sizes. By following these new guidelines compilers can generate code that improves the performance of programs executing on SMT machines.</description><subject>Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization</subject><subject>Hardware -- Emerging technologies -- Analysis and design of emerging devices and systems -- Emerging languages and compilers</subject><subject>Software and its engineering -- Software notations and tools -- Compilers</subject><isbn>0818679778</isbn><isbn>9780818679773</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>1997</creationdate><recordtype>conference_proceeding</recordtype><sourceid/><recordid>eNqVjr0OgjAYRZsYE_8Y3Ts5KbQgUAcno_EB2JuCRT-lLWnL4tMLwQfQu9w7nJschNaUhGmfKM4yRkg4FI0naEEYZVl-yHM2Q4FzT9Jnn1KaxHN0LDoN-o4ro1popMWm9aDgLTwY7XBtLHagusYLLU3n8DDBP6wUt_62QtNaNE4G316izeVcnK47USleGvNynBI-SPFRio9SyR_g9ieQlxZknXwA13FJZg</recordid><startdate>19971201</startdate><enddate>19971201</enddate><creator>Lo, Jack L.</creator><creator>Eggers, Susan J.</creator><creator>Levy, Henry M.</creator><creator>Parekh, Sujay S.</creator><creator>Tullsen, Dean M.</creator><general>IEEE Computer Society</general><scope/></search><sort><creationdate>19971201</creationdate><title>Tuning compiler optimizations for simultaneous multithreading</title><author>Lo, Jack L. ; Eggers, Susan J. ; Levy, Henry M. ; Parekh, Sujay S. ; Tullsen, Dean M.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-acm_books_10_5555_266800_2668123</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>1997</creationdate><topic>Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization</topic><topic>Hardware -- Emerging technologies -- Analysis and design of emerging devices and systems -- Emerging languages and compilers</topic><topic>Software and its engineering -- Software notations and tools -- Compilers</topic><toplevel>online_resources</toplevel><creatorcontrib>Lo, Jack L.</creatorcontrib><creatorcontrib>Eggers, Susan J.</creatorcontrib><creatorcontrib>Levy, Henry M.</creatorcontrib><creatorcontrib>Parekh, Sujay S.</creatorcontrib><creatorcontrib>Tullsen, Dean M.</creatorcontrib></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lo, Jack L.</au><au>Eggers, Susan J.</au><au>Levy, Henry M.</au><au>Parekh, Sujay S.</au><au>Tullsen, Dean M.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Tuning compiler optimizations for simultaneous multithreading</atitle><btitle>Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture</btitle><date>1997-12-01</date><risdate>1997</risdate><spage>114</spage><epage>124</epage><pages>114-124</pages><isbn>0818679778</isbn><isbn>9780818679773</isbn><abstract>Compiler optimizations are often driven by specific assumptions about the underlying architecture and implementation of the target machine. For example, when targeting shared-memory multiprocessors, parallel programs are compiled to minimize sharing, in order to decrease high-cost, inter-processor communication. This paper reexamines several compiler optimizations in the context of simultaneous multithreading (SMT), a processor architecture that issues instructions from multiple threads to the functional units each cycle. Unlike shared-memory multiprocessors, SMT provides and benefits from fine-grained sharing of processor and memory system resources; unlike current multiprocessors, SMT exposes and benefits from inter-thread instruction-level parallelism when hiding latencies. Therefore, optimizations that are appropriate for these conventional machines may be inappropriate for SMT. We revisit three optimizations in this light: loop-iteration scheduling, software speculative execution, and loop tiling. Our results show that all three optimizations should be applied differently in the context of SMT architectures: threads should be parallelized with a cyclic, rather than a blocked algorithm; non-loop programs should not be software speculated and compilers no longer need to be concerned about precisely sizing tiles to match cache sizes. By following these new guidelines compilers can generate code that improves the performance of programs executing on SMT machines.</abstract><cop>Washington, DC, USA</cop><pub>IEEE Computer Society</pub><doi>10.5555/266800.266812</doi></addata></record>
fulltext	fulltext
identifier	ISBN: 0818679778
ispartof	Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture, 1997, p.114-124
issn
language	eng
recordid	cdi_acm_books_10_5555_266800_266812
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Hardware -- Electronic design automation -- Logic synthesis -- Circuit optimization Hardware -- Emerging technologies -- Analysis and design of emerging devices and systems -- Emerging languages and compilers Software and its engineering -- Software notations and tools -- Compilers
title	Tuning compiler optimizations for simultaneous multithreading
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T18%3A29%3A26IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Tuning%20compiler%20optimizations%20for%20simultaneous%20multithreading&rft.btitle=Proceedings%20of%20the%2030th%20annual%20ACM/IEEE%20international%20symposium%20on%20Microarchitecture&rft.au=Lo,%20Jack%20L.&rft.date=1997-12-01&rft.spage=114&rft.epage=124&rft.pages=114-124&rft.isbn=0818679778&rft.isbn_list=9780818679773&rft_id=info:doi/10.5555/266800.266812&rft_dat=%3Cacm%3Eacm_books_10_5555_266800_266812%3C/acm%3E%3Cgrp_id%3Ecdi_FETCH-acm_books_10_5555_266800_2668123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true