Loading…
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | |
---|---|
cites | |
container_end_page | 27 |
container_issue | |
container_start_page | 27 |
container_title | |
container_volume | |
creator | Kim, Dongkeun Liao, Steve Shih-wei Wang, Perry H. Cuvillo, Juan del Tian, Xinmin Zou, Xiang Wang, Hong Yeung, Donald Girkar, Milind Shen, John P. |
description | Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important. |
doi_str_mv | 10.5555/977395.977665 |
format | conference_proceeding |
fullrecord | <record><control><sourceid>proquest_acm_b</sourceid><recordid>TN_cdi_acm_books_10_5555_977395_977665</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>31426228</sourcerecordid><originalsourceid>FETCH-LOGICAL-a184t-f15787bcaa1a83428a41961e5423228966100c2a7784c0743646567a845022e63</originalsourceid><addsrcrecordid>eNqNkD1PwzAYhC0hJKB0ZPcECym246-MqCq0UiU6lBFZrvuGGNK4xK6g_x6jILFyyw333A2H0BUlE5F1VylVVmKSTUpxgi6IkpVglLDqDI1jfCNZXFCl5Tl6WTXH6J1t8exrD73fQZds8qHDnz41eNVDDck1vnvFc2gzgddND3YbcUYWXYL2JuL5MQfFEMA2l4KDGEMfL9FpbdsI418foeeH2Xo6L5ZPj4vp_bKwVPNU1FQorTbOWmp1yZm2nFaSguCsZExXUlJCHLNKae6I4qXkUkhlNReEMZDlCF0Pu_s-fBwgJrPz0UHb2g7CIZqScibz0h9o3c5sQniPhhLz85oZXjPDaxm8_RdoNr2HuvwGfgBsWw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31426228</pqid></control><display><type>conference_proceeding</type><title>Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Kim, Dongkeun ; Liao, Steve Shih-wei ; Wang, Perry H. ; Cuvillo, Juan del ; Tian, Xinmin ; Zou, Xiang ; Wang, Hong ; Yeung, Donald ; Girkar, Milind ; Shen, John P.</creator><creatorcontrib>Kim, Dongkeun ; Liao, Steve Shih-wei ; Wang, Perry H. ; Cuvillo, Juan del ; Tian, Xinmin ; Zou, Xiang ; Wang, Hong ; Yeung, Donald ; Girkar, Milind ; Shen, John P.</creatorcontrib><description>Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.</description><identifier>ISBN: 0769521029</identifier><identifier>ISBN: 9780769521022</identifier><identifier>DOI: 10.5555/977395.977665</identifier><language>eng</language><publisher>Washington, DC, USA: IEEE Computer Society</publisher><subject>Applied computing -- Computers in other domains -- Personal computers and PC applications -- Microcomputers ; Computer systems organization -- Embedded and cyber-physical systems ; Computer systems organization -- Real-time systems ; Hardware -- Integrated circuits ; Software and its engineering -- Software organization and properties -- Contextual software domains -- Operating systems -- Process management -- Multithreading</subject><ispartof>2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004, p.27-27</ispartof><rights>Copyright (c) 2004 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,776,780,785,786,27904</link.rule.ids></links><search><creatorcontrib>Kim, Dongkeun</creatorcontrib><creatorcontrib>Liao, Steve Shih-wei</creatorcontrib><creatorcontrib>Wang, Perry H.</creatorcontrib><creatorcontrib>Cuvillo, Juan del</creatorcontrib><creatorcontrib>Tian, Xinmin</creatorcontrib><creatorcontrib>Zou, Xiang</creatorcontrib><creatorcontrib>Wang, Hong</creatorcontrib><creatorcontrib>Yeung, Donald</creatorcontrib><creatorcontrib>Girkar, Milind</creatorcontrib><creatorcontrib>Shen, John P.</creatorcontrib><title>Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors</title><title>2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004)</title><description>Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.</description><subject>Applied computing -- Computers in other domains -- Personal computers and PC applications -- Microcomputers</subject><subject>Computer systems organization -- Embedded and cyber-physical systems</subject><subject>Computer systems organization -- Real-time systems</subject><subject>Hardware -- Integrated circuits</subject><subject>Software and its engineering -- Software organization and properties -- Contextual software domains -- Operating systems -- Process management -- Multithreading</subject><isbn>0769521029</isbn><isbn>9780769521022</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2004</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNqNkD1PwzAYhC0hJKB0ZPcECym246-MqCq0UiU6lBFZrvuGGNK4xK6g_x6jILFyyw333A2H0BUlE5F1VylVVmKSTUpxgi6IkpVglLDqDI1jfCNZXFCl5Tl6WTXH6J1t8exrD73fQZds8qHDnz41eNVDDck1vnvFc2gzgddND3YbcUYWXYL2JuL5MQfFEMA2l4KDGEMfL9FpbdsI418foeeH2Xo6L5ZPj4vp_bKwVPNU1FQorTbOWmp1yZm2nFaSguCsZExXUlJCHLNKae6I4qXkUkhlNReEMZDlCF0Pu_s-fBwgJrPz0UHb2g7CIZqScibz0h9o3c5sQniPhhLz85oZXjPDaxm8_RdoNr2HuvwGfgBsWw</recordid><startdate>20040320</startdate><enddate>20040320</enddate><creator>Kim, Dongkeun</creator><creator>Liao, Steve Shih-wei</creator><creator>Wang, Perry H.</creator><creator>Cuvillo, Juan del</creator><creator>Tian, Xinmin</creator><creator>Zou, Xiang</creator><creator>Wang, Hong</creator><creator>Yeung, Donald</creator><creator>Girkar, Milind</creator><creator>Shen, John P.</creator><general>IEEE Computer Society</general><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20040320</creationdate><title>Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors</title><author>Kim, Dongkeun ; Liao, Steve Shih-wei ; Wang, Perry H. ; Cuvillo, Juan del ; Tian, Xinmin ; Zou, Xiang ; Wang, Hong ; Yeung, Donald ; Girkar, Milind ; Shen, John P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a184t-f15787bcaa1a83428a41961e5423228966100c2a7784c0743646567a845022e63</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Applied computing -- Computers in other domains -- Personal computers and PC applications -- Microcomputers</topic><topic>Computer systems organization -- Embedded and cyber-physical systems</topic><topic>Computer systems organization -- Real-time systems</topic><topic>Hardware -- Integrated circuits</topic><topic>Software and its engineering -- Software organization and properties -- Contextual software domains -- Operating systems -- Process management -- Multithreading</topic><toplevel>online_resources</toplevel><creatorcontrib>Kim, Dongkeun</creatorcontrib><creatorcontrib>Liao, Steve Shih-wei</creatorcontrib><creatorcontrib>Wang, Perry H.</creatorcontrib><creatorcontrib>Cuvillo, Juan del</creatorcontrib><creatorcontrib>Tian, Xinmin</creatorcontrib><creatorcontrib>Zou, Xiang</creatorcontrib><creatorcontrib>Wang, Hong</creatorcontrib><creatorcontrib>Yeung, Donald</creatorcontrib><creatorcontrib>Girkar, Milind</creatorcontrib><creatorcontrib>Shen, John P.</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Dongkeun</au><au>Liao, Steve Shih-wei</au><au>Wang, Perry H.</au><au>Cuvillo, Juan del</au><au>Tian, Xinmin</au><au>Zou, Xiang</au><au>Wang, Hong</au><au>Yeung, Donald</au><au>Girkar, Milind</au><au>Shen, John P.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors</atitle><btitle>2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004)</btitle><date>2004-03-20</date><risdate>2004</risdate><spage>27</spage><epage>27</epage><pages>27-27</pages><isbn>0769521029</isbn><isbn>9780769521022</isbn><abstract>Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.</abstract><cop>Washington, DC, USA</cop><pub>IEEE Computer Society</pub><doi>10.5555/977395.977665</doi><tpages>1</tpages></addata></record> |
fulltext | fulltext |
identifier | ISBN: 0769521029 |
ispartof | 2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004, p.27-27 |
issn | |
language | eng |
recordid | cdi_acm_books_10_5555_977395_977665 |
source | IEEE Electronic Library (IEL) Conference Proceedings |
subjects | Applied computing -- Computers in other domains -- Personal computers and PC applications -- Microcomputers Computer systems organization -- Embedded and cyber-physical systems Computer systems organization -- Real-time systems Hardware -- Integrated circuits Software and its engineering -- Software organization and properties -- Contextual software domains -- Operating systems -- Process management -- Multithreading |
title | Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T17%3A26%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_acm_b&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Physical%20Experimentation%20with%20Prefetching%20Helper%20Threads%20on%20Intel's%20Hyper-Threaded%20Processors&rft.btitle=2nd%20IEEE/ACM%20International%20Symposium%20on%20Code%20Generation%20and%20Optimization%20(CGO%202004)&rft.au=Kim,%20Dongkeun&rft.date=2004-03-20&rft.spage=27&rft.epage=27&rft.pages=27-27&rft.isbn=0769521029&rft.isbn_list=9780769521022&rft_id=info:doi/10.5555/977395.977665&rft_dat=%3Cproquest_acm_b%3E31426228%3C/proquest_acm_b%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a184t-f15787bcaa1a83428a41961e5423228966100c2a7784c0743646567a845022e63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=31426228&rft_id=info:pmid/&rfr_iscdi=true |