Loading…

Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising...

Full description

Saved in:
Bibliographic Details
Main Authors: Kim, Dongkeun, Liao, Steve Shih-wei, Wang, Perry H., Cuvillo, Juan del, Tian, Xinmin, Zou, Xiang, Wang, Hong, Yeung, Donald, Girkar, Milind, Shen, John P.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 27
container_issue
container_start_page 27
container_title
container_volume
creator Kim, Dongkeun
Liao, Steve Shih-wei
Wang, Perry H.
Cuvillo, Juan del
Tian, Xinmin
Zou, Xiang
Wang, Hong
Yeung, Donald
Girkar, Milind
Shen, John P.
description Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.
doi_str_mv 10.5555/977395.977665
format conference_proceeding
fullrecord <record><control><sourceid>proquest_acm_b</sourceid><recordid>TN_cdi_acm_books_10_5555_977395_977665</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>31426228</sourcerecordid><originalsourceid>FETCH-LOGICAL-a184t-f15787bcaa1a83428a41961e5423228966100c2a7784c0743646567a845022e63</originalsourceid><addsrcrecordid>eNqNkD1PwzAYhC0hJKB0ZPcECym246-MqCq0UiU6lBFZrvuGGNK4xK6g_x6jILFyyw333A2H0BUlE5F1VylVVmKSTUpxgi6IkpVglLDqDI1jfCNZXFCl5Tl6WTXH6J1t8exrD73fQZds8qHDnz41eNVDDck1vnvFc2gzgddND3YbcUYWXYL2JuL5MQfFEMA2l4KDGEMfL9FpbdsI418foeeH2Xo6L5ZPj4vp_bKwVPNU1FQorTbOWmp1yZm2nFaSguCsZExXUlJCHLNKae6I4qXkUkhlNReEMZDlCF0Pu_s-fBwgJrPz0UHb2g7CIZqScibz0h9o3c5sQniPhhLz85oZXjPDaxm8_RdoNr2HuvwGfgBsWw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31426228</pqid></control><display><type>conference_proceeding</type><title>Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Kim, Dongkeun ; Liao, Steve Shih-wei ; Wang, Perry H. ; Cuvillo, Juan del ; Tian, Xinmin ; Zou, Xiang ; Wang, Hong ; Yeung, Donald ; Girkar, Milind ; Shen, John P.</creator><creatorcontrib>Kim, Dongkeun ; Liao, Steve Shih-wei ; Wang, Perry H. ; Cuvillo, Juan del ; Tian, Xinmin ; Zou, Xiang ; Wang, Hong ; Yeung, Donald ; Girkar, Milind ; Shen, John P.</creatorcontrib><description>Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.</description><identifier>ISBN: 0769521029</identifier><identifier>ISBN: 9780769521022</identifier><identifier>DOI: 10.5555/977395.977665</identifier><language>eng</language><publisher>Washington, DC, USA: IEEE Computer Society</publisher><subject>Applied computing -- Computers in other domains -- Personal computers and PC applications -- Microcomputers ; Computer systems organization -- Embedded and cyber-physical systems ; Computer systems organization -- Real-time systems ; Hardware -- Integrated circuits ; Software and its engineering -- Software organization and properties -- Contextual software domains -- Operating systems -- Process management -- Multithreading</subject><ispartof>2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004, p.27-27</ispartof><rights>Copyright (c) 2004 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.</rights><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,776,780,785,786,27904</link.rule.ids></links><search><creatorcontrib>Kim, Dongkeun</creatorcontrib><creatorcontrib>Liao, Steve Shih-wei</creatorcontrib><creatorcontrib>Wang, Perry H.</creatorcontrib><creatorcontrib>Cuvillo, Juan del</creatorcontrib><creatorcontrib>Tian, Xinmin</creatorcontrib><creatorcontrib>Zou, Xiang</creatorcontrib><creatorcontrib>Wang, Hong</creatorcontrib><creatorcontrib>Yeung, Donald</creatorcontrib><creatorcontrib>Girkar, Milind</creatorcontrib><creatorcontrib>Shen, John P.</creatorcontrib><title>Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors</title><title>2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004)</title><description>Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.</description><subject>Applied computing -- Computers in other domains -- Personal computers and PC applications -- Microcomputers</subject><subject>Computer systems organization -- Embedded and cyber-physical systems</subject><subject>Computer systems organization -- Real-time systems</subject><subject>Hardware -- Integrated circuits</subject><subject>Software and its engineering -- Software organization and properties -- Contextual software domains -- Operating systems -- Process management -- Multithreading</subject><isbn>0769521029</isbn><isbn>9780769521022</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2004</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNqNkD1PwzAYhC0hJKB0ZPcECym246-MqCq0UiU6lBFZrvuGGNK4xK6g_x6jILFyyw333A2H0BUlE5F1VylVVmKSTUpxgi6IkpVglLDqDI1jfCNZXFCl5Tl6WTXH6J1t8exrD73fQZds8qHDnz41eNVDDck1vnvFc2gzgddND3YbcUYWXYL2JuL5MQfFEMA2l4KDGEMfL9FpbdsI418foeeH2Xo6L5ZPj4vp_bKwVPNU1FQorTbOWmp1yZm2nFaSguCsZExXUlJCHLNKae6I4qXkUkhlNReEMZDlCF0Pu_s-fBwgJrPz0UHb2g7CIZqScibz0h9o3c5sQniPhhLz85oZXjPDaxm8_RdoNr2HuvwGfgBsWw</recordid><startdate>20040320</startdate><enddate>20040320</enddate><creator>Kim, Dongkeun</creator><creator>Liao, Steve Shih-wei</creator><creator>Wang, Perry H.</creator><creator>Cuvillo, Juan del</creator><creator>Tian, Xinmin</creator><creator>Zou, Xiang</creator><creator>Wang, Hong</creator><creator>Yeung, Donald</creator><creator>Girkar, Milind</creator><creator>Shen, John P.</creator><general>IEEE Computer Society</general><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20040320</creationdate><title>Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors</title><author>Kim, Dongkeun ; Liao, Steve Shih-wei ; Wang, Perry H. ; Cuvillo, Juan del ; Tian, Xinmin ; Zou, Xiang ; Wang, Hong ; Yeung, Donald ; Girkar, Milind ; Shen, John P.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a184t-f15787bcaa1a83428a41961e5423228966100c2a7784c0743646567a845022e63</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2004</creationdate><topic>Applied computing -- Computers in other domains -- Personal computers and PC applications -- Microcomputers</topic><topic>Computer systems organization -- Embedded and cyber-physical systems</topic><topic>Computer systems organization -- Real-time systems</topic><topic>Hardware -- Integrated circuits</topic><topic>Software and its engineering -- Software organization and properties -- Contextual software domains -- Operating systems -- Process management -- Multithreading</topic><toplevel>online_resources</toplevel><creatorcontrib>Kim, Dongkeun</creatorcontrib><creatorcontrib>Liao, Steve Shih-wei</creatorcontrib><creatorcontrib>Wang, Perry H.</creatorcontrib><creatorcontrib>Cuvillo, Juan del</creatorcontrib><creatorcontrib>Tian, Xinmin</creatorcontrib><creatorcontrib>Zou, Xiang</creatorcontrib><creatorcontrib>Wang, Hong</creatorcontrib><creatorcontrib>Yeung, Donald</creatorcontrib><creatorcontrib>Girkar, Milind</creatorcontrib><creatorcontrib>Shen, John P.</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kim, Dongkeun</au><au>Liao, Steve Shih-wei</au><au>Wang, Perry H.</au><au>Cuvillo, Juan del</au><au>Tian, Xinmin</au><au>Zou, Xiang</au><au>Wang, Hong</au><au>Yeung, Donald</au><au>Girkar, Milind</au><au>Shen, John P.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors</atitle><btitle>2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004)</btitle><date>2004-03-20</date><risdate>2004</risdate><spage>27</spage><epage>27</epage><pages>27-27</pages><isbn>0769521029</isbn><isbn>9780769521022</isbn><abstract>Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.</abstract><cop>Washington, DC, USA</cop><pub>IEEE Computer Society</pub><doi>10.5555/977395.977665</doi><tpages>1</tpages></addata></record>
fulltext fulltext
identifier ISBN: 0769521029
ispartof 2nd IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2004), 2004, p.27-27
issn
language eng
recordid cdi_acm_books_10_5555_977395_977665
source IEEE Electronic Library (IEL) Conference Proceedings
subjects Applied computing -- Computers in other domains -- Personal computers and PC applications -- Microcomputers
Computer systems organization -- Embedded and cyber-physical systems
Computer systems organization -- Real-time systems
Hardware -- Integrated circuits
Software and its engineering -- Software organization and properties -- Contextual software domains -- Operating systems -- Process management -- Multithreading
title Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T17%3A26%3A39IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_acm_b&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Physical%20Experimentation%20with%20Prefetching%20Helper%20Threads%20on%20Intel's%20Hyper-Threaded%20Processors&rft.btitle=2nd%20IEEE/ACM%20International%20Symposium%20on%20Code%20Generation%20and%20Optimization%20(CGO%202004)&rft.au=Kim,%20Dongkeun&rft.date=2004-03-20&rft.spage=27&rft.epage=27&rft.pages=27-27&rft.isbn=0769521029&rft.isbn_list=9780769521022&rft_id=info:doi/10.5555/977395.977665&rft_dat=%3Cproquest_acm_b%3E31426228%3C/proquest_acm_b%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a184t-f15787bcaa1a83428a41961e5423228966100c2a7784c0743646567a845022e63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=31426228&rft_id=info:pmid/&rfr_iscdi=true