Loading…

Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems

For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow i...

Full description

Saved in:
Bibliographic Details
Published in:Computer science (Berlin, Germany) Germany), 2011-06, Vol.26 (3-4), p.247-256
Main Authors: Balaji, Pavan, Gupta, Rinku, Vishnu, Abhinav, Beckman, Pete
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623
cites cdi_FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623
container_end_page 256
container_issue 3-4
container_start_page 247
container_title Computer science (Berlin, Germany)
container_volume 26
creator Balaji, Pavan
Gupta, Rinku
Vishnu, Abhinav
Beckman, Pete
description For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved.
doi_str_mv 10.1007/s00450-011-0168-y
format article
fullrecord <record><control><sourceid>crossref_sprin</sourceid><recordid>TN_cdi_crossref_primary_10_1007_s00450_011_0168_y</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>10_1007_s00450_011_0168_y</sourcerecordid><originalsourceid>FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623</originalsourceid><addsrcrecordid>eNp9kM1OwzAQhC0EEqXwANz8AoG1HTvuEVX8VCriAmfLcZ3ikp_K61Dl7XFVxJHDaucws9r5CLllcMcAqnsEKCUUwFgepYvpjMyYVrLgUPLzPy3KS3KFuANQnDGYkc2r3e9Dv6Vu6LqxD86mMPS0tdMwJqRpoL1PhyF-0U8bNwcbPXVZWZd8DJiCQ5rtnUUM375AZ1tP63b0dOt7T3HC5Du8JheNbdHf_O45-Xh6fF--FOu359XyYV04rnUqhKolhwq8WGjesFyoLJtSVlDLpuJSSb-QwlmxqUTFmbB1rXW94HUplK2U4mJO2OmuiwNi9I3Zx9DZOBkG5ojJnDCZjMkcMZkpZ_gpg9nbb300u2GMfX7zn9APPZFsbA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems</title><source>SpringerLink Contemporary</source><creator>Balaji, Pavan ; Gupta, Rinku ; Vishnu, Abhinav ; Beckman, Pete</creator><creatorcontrib>Balaji, Pavan ; Gupta, Rinku ; Vishnu, Abhinav ; Beckman, Pete</creatorcontrib><description>For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved.</description><identifier>ISSN: 1865-2034</identifier><identifier>EISSN: 1865-2042</identifier><identifier>DOI: 10.1007/s00450-011-0168-y</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer-Verlag</publisher><subject>Computer Hardware ; Computer Science ; Computer Systems Organization and Communication Networks ; Data Structures and Information Theory ; Software Engineering/Programming and Operating Systems ; Special Issue Paper ; Theory of Computation</subject><ispartof>Computer science (Berlin, Germany), 2011-06, Vol.26 (3-4), p.247-256</ispartof><rights>Argonne National Laboratory 2011</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623</citedby><cites>FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00450-011-0168-y$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00450-011-0168-y$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,1638,27903,27904,41397,42466,51297</link.rule.ids></links><search><creatorcontrib>Balaji, Pavan</creatorcontrib><creatorcontrib>Gupta, Rinku</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Beckman, Pete</creatorcontrib><title>Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems</title><title>Computer science (Berlin, Germany)</title><addtitle>Comput Sci Res Dev</addtitle><description>For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved.</description><subject>Computer Hardware</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Data Structures and Information Theory</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>Special Issue Paper</subject><subject>Theory of Computation</subject><issn>1865-2034</issn><issn>1865-2042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2011</creationdate><recordtype>article</recordtype><recordid>eNp9kM1OwzAQhC0EEqXwANz8AoG1HTvuEVX8VCriAmfLcZ3ikp_K61Dl7XFVxJHDaucws9r5CLllcMcAqnsEKCUUwFgepYvpjMyYVrLgUPLzPy3KS3KFuANQnDGYkc2r3e9Dv6Vu6LqxD86mMPS0tdMwJqRpoL1PhyF-0U8bNwcbPXVZWZd8DJiCQ5rtnUUM375AZ1tP63b0dOt7T3HC5Du8JheNbdHf_O45-Xh6fF--FOu359XyYV04rnUqhKolhwq8WGjesFyoLJtSVlDLpuJSSb-QwlmxqUTFmbB1rXW94HUplK2U4mJO2OmuiwNi9I3Zx9DZOBkG5ojJnDCZjMkcMZkpZ_gpg9nbb300u2GMfX7zn9APPZFsbA</recordid><startdate>20110601</startdate><enddate>20110601</enddate><creator>Balaji, Pavan</creator><creator>Gupta, Rinku</creator><creator>Vishnu, Abhinav</creator><creator>Beckman, Pete</creator><general>Springer-Verlag</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20110601</creationdate><title>Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems</title><author>Balaji, Pavan ; Gupta, Rinku ; Vishnu, Abhinav ; Beckman, Pete</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2011</creationdate><topic>Computer Hardware</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Data Structures and Information Theory</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>Special Issue Paper</topic><topic>Theory of Computation</topic><toplevel>online_resources</toplevel><creatorcontrib>Balaji, Pavan</creatorcontrib><creatorcontrib>Gupta, Rinku</creatorcontrib><creatorcontrib>Vishnu, Abhinav</creatorcontrib><creatorcontrib>Beckman, Pete</creatorcontrib><collection>CrossRef</collection><jtitle>Computer science (Berlin, Germany)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Balaji, Pavan</au><au>Gupta, Rinku</au><au>Vishnu, Abhinav</au><au>Beckman, Pete</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems</atitle><jtitle>Computer science (Berlin, Germany)</jtitle><stitle>Comput Sci Res Dev</stitle><date>2011-06-01</date><risdate>2011</risdate><volume>26</volume><issue>3-4</issue><spage>247</spage><epage>256</epage><pages>247-256</pages><issn>1865-2034</issn><eissn>1865-2042</eissn><abstract>For parallel applications running on high-end computing systems, which processes of an application get launched on which processing cores is typically determined at application launch time without any information about the application characteristics. As high-end computing systems continue to grow in scale, however, this approach is becoming increasingly infeasible for achieving the best performance. For example, for systems such as IBM Blue Gene and Cray XT that rely on flat 3D torus networks, process communication often involves network sharing, even for highly scalable applications. This causes the overall application performance to depend heavily on how processes are mapped on the network. In this paper, we first analyze the impact of different process mappings on application performance on a massive Blue Gene/P system. Then, we match this analysis with application communication patterns that we allow applications to describe prior to being launched. The underlying process management system can use this combined information in conjunction with the hardware characteristics of the system to determine the best mapping for the application. Our experiments study the performance of different communication patterns, including 2D and 3D nearest-neighbor communication and structured Cartesian grid communication. Our studies, that scale up to 131,072 cores of the largest BG/P system in the United States (using 80% of the total system size), demonstrate that different process mappings can show significant difference in overall performance, especially on scale. For example, we show that this difference can be as much as 30% for P3DFFT and up to twofold for HALO. Through our proposed model, however, such differences in performance can be avoided so that the best possible performance is always achieved.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer-Verlag</pub><doi>10.1007/s00450-011-0168-y</doi><tpages>10</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1865-2034
ispartof Computer science (Berlin, Germany), 2011-06, Vol.26 (3-4), p.247-256
issn 1865-2034
1865-2042
language eng
recordid cdi_crossref_primary_10_1007_s00450_011_0168_y
source SpringerLink Contemporary
subjects Computer Hardware
Computer Science
Computer Systems Organization and Communication Networks
Data Structures and Information Theory
Software Engineering/Programming and Operating Systems
Special Issue Paper
Theory of Computation
title Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-22T16%3A44%3A59IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_sprin&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Mapping%20communication%20layouts%20to%20network%20hardware%20characteristics%20on%20massive-scale%20blue%20gene%20systems&rft.jtitle=Computer%20science%20(Berlin,%20Germany)&rft.au=Balaji,%20Pavan&rft.date=2011-06-01&rft.volume=26&rft.issue=3-4&rft.spage=247&rft.epage=256&rft.pages=247-256&rft.issn=1865-2034&rft.eissn=1865-2042&rft_id=info:doi/10.1007/s00450-011-0168-y&rft_dat=%3Ccrossref_sprin%3E10_1007_s00450_011_0168_y%3C/crossref_sprin%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c288t-36b52070e3982f104544f4570b5f72565e953ca3d737213abb88b92b436a76623%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true