Loading…
Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems
For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow i...
Saved in:
Published in: | Computer science (Berlin, Germany) Germany), 2011-06, Vol.26 (3-4), p.247-256 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623 |
---|---|
cites | cdi_FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623 |
container_end_page | 256 |
container_issue | 3-4 |
container_start_page | 247 |
container_title | Computer science (Berlin, Germany) |
container_volume | 26 |
creator | Balaji, Pavan Gupta, Rinku Vishnu, Abhinav Beckman, Pete |
description | For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved. |
doi_str_mv | 10.1007/s00450-011-0168-y |
format | article |
fullrecord | <record><control><sourceid>crossref_sprin</sourceid><recordid>TN_cdi_crossref_primary_10_1007_s00450_011_0168_y</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1007_s00450_011_0168_y</sourcerecordid><originalsourceid>FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623</originalsourceid><addsrcrecordid>eNp9kM1OwzAQhC0EEqXwANz8AoG1HTvuEVX8VCriAmfLcZ3ikp_K61Dl7XFVxJHDaucws9r5CLllcMcAqnsEKCUUwFgepYvpjMyYVrLgUPLzPy3KS3KFuANQnDGYkc2r3e9Dv6Vu6LqxD86mMPS0tdMwJqRpoL1PhyF-0U8bNwcbPXVZWZd8DJiCQ5rtnUUM375AZ1tP63b0dOt7T3HC5Du8JheNbdHf_O45-Xh6fF--FOu359XyYV04rnUqhKolhwq8WGjesFyoLJtSVlDLpuJSSb-QwlmxqUTFmbB1rXW94HUplK2U4mJO2OmuiwNi9I3Zx9DZOBkG5ojJnDCZjMkcMZkpZ_gpg9nbb300u2GMfX7zn9APPZFsbA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems</title><source>SpringerLink Contemporary</source><creator>Balaji, Pavan ; Gupta, Rinku ; Vishnu, Abhinav ; Beckman, Pete</creator><creatorcontrib>Balaji, Pavan ; Gupta, Rinku ; Vishnu, Abhinav ; Beckman, Pete</creatorcontrib><description>For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved.</description><identifier>ISSN: 1865-2034</identifier><identifier>EISSN: 1865-2042</identifier><identifier>DOI: 10.1007/s00450-011-0168-y</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer-Verlag</publisher><subject>Computer Hardware ; Computer Science ; Computer Systems Organization and Communication Networks ; Data Structures and Information Theory ; Software Engineering/Programming and Operating Systems ; Special Issue Paper ; Theory of Computation</subject><ispartof>Computer science (Berlin, Germany), 2011-06, Vol.26 (3-4), p.247-256</ispartof><rights>Argonne National Laboratory 2011</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623</citedby><cites>FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00450-011-0168-y$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00450-011-0168-y$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,1638,27903,27904,41397,42466,51297</link.rule.ids></links><search><creatorcontrib>Balaji, Pavan</creatorcontrib><creatorcontrib>Gupta, Rinku</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Beckman, Pete</creatorcontrib><title>Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems</title><title>Computer science (Berlin, Germany)</title><addtitle>Comput Sci Res Dev</addtitle><description>For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved.</description><subject>Computer Hardware</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Data Structures and Information Theory</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Special Issue Paper</subject><subject>Theory of Computation</subject><issn>1865-2034</issn><issn>1865-2042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAQhC0EEqXwANz8AoG1HTvuEVX8VCriAmfLcZ3ikp_K61Dl7XFVxJHDaucws9r5CLllcMcAqnsEKCUUwFgepYvpjMyYVrLgUPLzPy3KS3KFuANQnDGYkc2r3e9Dv6Vu6LqxD86mMPS0tdMwJqRpoL1PhyF-0U8bNwcbPXVZWZd8DJiCQ5rtnUUM375AZ1tP63b0dOt7T3HC5Du8JheNbdHf_O45-Xh6fF--FOu359XyYV04rnUqhKolhwq8WGjesFyoLJtSVlDLpuJSSb-QwlmxqUTFmbB1rXW94HUplK2U4mJO2OmuiwNi9I3Zx9DZOBkG5ojJnDCZjMkcMZkpZ_gpg9nbb300u2GMfX7zn9APPZFsbA</recordid><startdate>20110601</startdate><enddate>20110601</enddate><creator>Balaji, Pavan</creator><creator>Gupta, Rinku</creator><creator>Vishnu, Abhinav</creator><creator>Beckman, Pete</creator><general>Springer-Verlag</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20110601</creationdate><title>Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems</title><author>Balaji, Pavan ; Gupta, Rinku ; Vishnu, Abhinav ; Beckman, Pete</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Computer Hardware</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Data Structures and Information Theory</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Special Issue Paper</topic><topic>Theory of Computation</topic><toplevel>online_resources</toplevel><creatorcontrib>Balaji, Pavan</creatorcontrib><creatorcontrib>Gupta, Rinku</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Beckman, Pete</creatorcontrib><collection>CrossRef</collection><jtitle>Computer science (Berlin, Germany)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Balaji, Pavan</au><au>Gupta, Rinku</au><au>Vishnu, Abhinav</au><au>Beckman, Pete</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems</atitle><jtitle>Computer science (Berlin, Germany)</jtitle><stitle>Comput Sci Res Dev</stitle><date>2011-06-01</date><risdate>2011</risdate><volume>26</volume><issue>3-4</issue><spage>247</spage><epage>256</epage><pages>247-256</pages><issn>1865-2034</issn><eissn>1865-2042</eissn><abstract>For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer-Verlag</pub><doi>10.1007/s00450-011-0168-y</doi><tpages>10</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1865-2034 |
ispartof | Computer science (Berlin, Germany), 2011-06, Vol.26 (3-4), p.247-256 |
issn | 1865-2034 1865-2042 |
language | eng |
recordid | cdi_crossref_primary_10_1007_s00450_011_0168_y |
source | SpringerLink Contemporary |
subjects | Computer Hardware Computer Science Computer Systems Organization and Communication Networks Data Structures and Information Theory Software Engineering/Programming and Operating Systems Special Issue Paper Theory of Computation |
title | Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T16%3A44%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mapping%20communication%20layouts%20to%20network%20hardware%20characteristics%20on%20massive-scale%20blue%20gene%20systems&rft.jtitle=Computer%20science%20(Berlin,%20Germany)&rft.au=Balaji,%20Pavan&rft.date=2011-06-01&rft.volume=26&rft.issue=3-4&rft.spage=247&rft.epage=256&rft.pages=247-256&rft.issn=1865-2034&rft.eissn=1865-2042&rft_id=info:doi/10.1007/s00450-011-0168-y&rft_dat=%3Ccrossref_sprin%3E10_1007_s00450_011_0168_y%3C/crossref_sprin%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |