Loading…

Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising...

Full description

Saved in:
Bibliographic Details
Main Authors: Kim, Dongkeun, Liao, Steve Shih-wei, Wang, Perry H., Cuvillo, Juan del, Tian, Xinmin, Zou, Xiang, Wang, Hong, Yeung, Donald, Girkar, Milind, Shen, John P.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Pre-execution techniques have received much attention as aneffective way of prefetching cache blocks to tolerate the ever-increasingmemory latency. A number of pre-execution techniquesbased on hardware, compiler, or both have been proposed andstudied extensively by researchers. They report promising resultson simulators that model a Simultaneous Multithreading (SMT)processor. In this paper, we apply the helper threading idea ona real multithreaded machine, i.e., Intel Pentium 4 processor withHyper-Threading Technology, and show that indeed it can providewall-clock speedup on real silicon. To achieve further performanceimprovements via helper threads, we investigate threehelper threading scenarios that are driven by automated compilerinfrastructure, and identify several key challenges and opportunitiesfor novel hardware and software optimizations. Our studyshows a program behavior changes dynamically during execution.In addition, the organizations of certain critical hardware structuresin the hyper-threaded processors are either shared or partitionedin the multi-threading mode and thus, the tradeoffs regardingresource contention can be intricate. Therefore, it is essentialto judiciously invoke helper threads by adapting to the dynamicprogram behavior so that we can alleviate potential performancedegradation due to resource contention. Moreover, since adaptingto the dynamic behavior requires frequent thread synchronization,having light-weight thread synchronization mechanisms is important.
DOI:10.5555/977395.977665