Loading…
Early experiences with large-scale Cray XMT systems
Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512-processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 9 |
container_issue | |
container_start_page | 1 |
container_title | |
container_volume | |
creator | Mizell, D. Maschhoff, K. |
description | Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512-processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these larger systems. We discuss how the programmer must work with the XMT compiler to extract maximum parallelism and performance, especially from multiply nested loops, and how the performance tools provide vital information about whether or how the compiler has parallelized loops and where performance bottlenecks may be occurring. We also show data that indicate that the maximum performance of a given application on a given size XMT system is limited by memory or network bandwidth, in a way that is somewhat independent of the number of processors used. |
doi_str_mv | 10.1109/IPDPS.2009.5161108 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_5161108</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>5161108</ieee_id><sourcerecordid>5161108</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-276406c575b8e0f1cdde445fde86033e2bb0c10794b263c06c03c7f77016e1123</originalsourceid><addsrcrecordid>eNpVj09Lw0AUxFdUsNR8Ab3sF0h8b_8mR4m1FioWzMFb2WxedCWVshvQfHsD9uJchvkxDAxjNwgFIlR3m93D7rUQAFWh0cyoPGNZZUtUQilpNajzfxnxgi1QS8gFWH3FspQ-YZaakbELJlcuDhOnnyPFQF-eEv8O4wcfXHynPHk3EK-jm_jbc8PTlEY6pGt22bshUXbyJWseV039lG9f1pv6fpuHCsZcWKPAeG11WxL06LuOlNJ9R6UBKUm0LXgEW6lWGOnnKkhve2sBDSEKuWS3f7OBiPbHGA4uTvvTa_kLzP5HDQ</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Early experiences with large-scale Cray XMT systems</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Mizell, D. ; Maschhoff, K.</creator><creatorcontrib>Mizell, D. ; Maschhoff, K.</creatorcontrib><description>Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512-processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these larger systems. We discuss how the programmer must work with the XMT compiler to extract maximum parallelism and performance, especially from multiply nested loops, and how the performance tools provide vital information about whether or how the compiler has parallelized loops and where performance bottlenecks may be occurring. We also show data that indicate that the maximum performance of a given application on a given size XMT system is limited by memory or network bandwidth, in a way that is somewhat independent of the number of processors used.</description><identifier>ISSN: 1530-2075</identifier><identifier>ISBN: 9781424437511</identifier><identifier>ISBN: 1424437512</identifier><identifier>EISBN: 9781424437504</identifier><identifier>EISBN: 1424437504</identifier><identifier>DOI: 10.1109/IPDPS.2009.5161108</identifier><language>eng</language><publisher>IEEE</publisher><subject>Bandwidth ; Computer architecture ; Large-scale systems ; multithreading ; performance tuning ; Program processors ; Programming profession ; Prototypes ; scaling ; Switches ; System testing ; Throughput ; Yarn</subject><ispartof>2009 IEEE International Symposium on Parallel & Distributed Processing, 2009, p.1-9</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/5161108$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,780,784,789,790,2058,27925,54555,54920,54932</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/5161108$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Mizell, D.</creatorcontrib><creatorcontrib>Maschhoff, K.</creatorcontrib><title>Early experiences with large-scale Cray XMT systems</title><title>2009 IEEE International Symposium on Parallel & Distributed Processing</title><addtitle>IPDPS</addtitle><description>Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512-processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these larger systems. We discuss how the programmer must work with the XMT compiler to extract maximum parallelism and performance, especially from multiply nested loops, and how the performance tools provide vital information about whether or how the compiler has parallelized loops and where performance bottlenecks may be occurring. We also show data that indicate that the maximum performance of a given application on a given size XMT system is limited by memory or network bandwidth, in a way that is somewhat independent of the number of processors used.</description><subject>Bandwidth</subject><subject>Computer architecture</subject><subject>Large-scale systems</subject><subject>multithreading</subject><subject>performance tuning</subject><subject>Program processors</subject><subject>Programming profession</subject><subject>Prototypes</subject><subject>scaling</subject><subject>Switches</subject><subject>System testing</subject><subject>Throughput</subject><subject>Yarn</subject><issn>1530-2075</issn><isbn>9781424437511</isbn><isbn>1424437512</isbn><isbn>9781424437504</isbn><isbn>1424437504</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2009</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><recordid>eNpVj09Lw0AUxFdUsNR8Ab3sF0h8b_8mR4m1FioWzMFb2WxedCWVshvQfHsD9uJchvkxDAxjNwgFIlR3m93D7rUQAFWh0cyoPGNZZUtUQilpNajzfxnxgi1QS8gFWH3FspQ-YZaakbELJlcuDhOnnyPFQF-eEv8O4wcfXHynPHk3EK-jm_jbc8PTlEY6pGt22bshUXbyJWseV039lG9f1pv6fpuHCsZcWKPAeG11WxL06LuOlNJ9R6UBKUm0LXgEW6lWGOnnKkhve2sBDSEKuWS3f7OBiPbHGA4uTvvTa_kLzP5HDQ</recordid><startdate>200905</startdate><enddate>200905</enddate><creator>Mizell, D.</creator><creator>Maschhoff, K.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200905</creationdate><title>Early experiences with large-scale Cray XMT systems</title><author>Mizell, D. ; Maschhoff, K.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-276406c575b8e0f1cdde445fde86033e2bb0c10794b263c06c03c7f77016e1123</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2009</creationdate><topic>Bandwidth</topic><topic>Computer architecture</topic><topic>Large-scale systems</topic><topic>multithreading</topic><topic>performance tuning</topic><topic>Program processors</topic><topic>Programming profession</topic><topic>Prototypes</topic><topic>scaling</topic><topic>Switches</topic><topic>System testing</topic><topic>Throughput</topic><topic>Yarn</topic><toplevel>online_resources</toplevel><creatorcontrib>Mizell, D.</creatorcontrib><creatorcontrib>Maschhoff, K.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Xplore</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Mizell, D.</au><au>Maschhoff, K.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Early experiences with large-scale Cray XMT systems</atitle><btitle>2009 IEEE International Symposium on Parallel & Distributed Processing</btitle><stitle>IPDPS</stitle><date>2009-05</date><risdate>2009</risdate><spage>1</spage><epage>9</epage><pages>1-9</pages><issn>1530-2075</issn><isbn>9781424437511</isbn><isbn>1424437512</isbn><eisbn>9781424437504</eisbn><eisbn>1424437504</eisbn><abstract>Several 64-processor XMT systems have now been shipped to customers and there have been 128-processor, 256-processor and 512-processor systems tested in Cray's development lab. We describe some techniques we have used for tuning performance in hopes that applications continued to scale on these larger systems. We discuss how the programmer must work with the XMT compiler to extract maximum parallelism and performance, especially from multiply nested loops, and how the performance tools provide vital information about whether or how the compiler has parallelized loops and where performance bottlenecks may be occurring. We also show data that indicate that the maximum performance of a given application on a given size XMT system is limited by memory or network bandwidth, in a way that is somewhat independent of the number of processors used.</abstract><pub>IEEE</pub><doi>10.1109/IPDPS.2009.5161108</doi><tpages>9</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1530-2075 |
ispartof | 2009 IEEE International Symposium on Parallel & Distributed Processing, 2009, p.1-9 |
issn | 1530-2075 |
language | eng |
recordid | cdi_ieee_primary_5161108 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Bandwidth Computer architecture Large-scale systems multithreading performance tuning Program processors Programming profession Prototypes scaling Switches System testing Throughput Yarn |
title | Early experiences with large-scale Cray XMT systems |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T14%3A27%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Early%20experiences%20with%20large-scale%20Cray%20XMT%20systems&rft.btitle=2009%20IEEE%20International%20Symposium%20on%20Parallel%20&%20Distributed%20Processing&rft.au=Mizell,%20D.&rft.date=2009-05&rft.spage=1&rft.epage=9&rft.pages=1-9&rft.issn=1530-2075&rft.isbn=9781424437511&rft.isbn_list=1424437512&rft_id=info:doi/10.1109/IPDPS.2009.5161108&rft.eisbn=9781424437504&rft.eisbn_list=1424437504&rft_dat=%3Cieee_6IE%3E5161108%3C/ieee_6IE%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-i90t-276406c575b8e0f1cdde445fde86033e2bb0c10794b263c06c03c7f77016e1123%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=5161108&rfr_iscdi=true |