Loading…

NSL-BLRL:Efficient CacheWarmup for Sampled Processor Simulation

Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation which selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue...

Full description

Saved in:
Bibliographic Details
Main Authors: Van Ertvelde, Luk, Hellebaut, Filip, Eeckhout, Lieven, De Bosschere, Koen
Format: Conference Proceeding
Language:English
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 177
container_issue
container_start_page 168
container_title
container_volume
creator Van Ertvelde, Luk
Hellebaut, Filip
Eeckhout, Lieven
De Bosschere, Koen
description Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation which selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at the beginning of each sample. This is well known in the literature as the cold-start problem. The hardware structures that suffer the most from the cold-start problem are cache hierarchies. In this paper we propose NSL-BLRL which combines two previously proposed cache hierarchy warmup approaches, namely No-State-Loss (NSL) and Boundary Line Reuse Latency (BLRL). The idea of NSL-BLRL is to warmup the cache hierarchy using a hardware state checkpoint that stores a truncated NSL stream. The NSL stream is a leastrecently used stream of (unique) memory references in the pre-sample. This NSL stream is then truncated to form the NSL-BLRL warmup checkpoint; this is done by inspecting the sample for determining how far in the pre-sample one needs to go back to accurately warmup the hardware state for the given sample. We show using SPEC CPU2000 benchmarks that NSL-BLRL is (i) nearly as accurate as BLRL and NSL for sampled processor simulation, (ii) yields simulation time speedups of several orders of magnitude compared to BLRL, and (iii) is more space-efficient than NSL. As such, we conclude that NSL-BLRL is a highly efficient and accurate cache warmup strategy for sampled processor simulation.
format conference_proceeding
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_miscellaneous_31385998</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>31385998</sourcerecordid><originalsourceid>FETCH-proquest_miscellaneous_313859983</originalsourceid><addsrcrecordid>eNqNyrsKwjAYQOGACtbLO2RyK6SXaOIiWCoORcQKupUQE4wkTe3fvL8IPoDTgY8zQjOyWXOaUsrZGEUJYSRO8-Q-RTOAFyEpyUkeod2pruJ9dam2pdZGGtUOuBDyqW6id6HD2ve4Fq6z6oHPvZcK4CvGBSsG49sFmmhhQS1_naPVobwWx7jr_TsoGBpnQCprRat8gCZLMkY5Z9nf4wdRHz0A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype><pqid>31385998</pqid></control><display><type>conference_proceeding</type><title>NSL-BLRL:Efficient CacheWarmup for Sampled Processor Simulation</title><source>IEEE Xplore All Conference Series</source><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Van Ertvelde, Luk ; Hellebaut, Filip ; Eeckhout, Lieven ; De Bosschere, Koen</creator><creatorcontrib>Van Ertvelde, Luk ; Hellebaut, Filip ; Eeckhout, Lieven ; De Bosschere, Koen</creatorcontrib><description>Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation which selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at the beginning of each sample. This is well known in the literature as the cold-start problem. The hardware structures that suffer the most from the cold-start problem are cache hierarchies. In this paper we propose NSL-BLRL which combines two previously proposed cache hierarchy warmup approaches, namely No-State-Loss (NSL) and Boundary Line Reuse Latency (BLRL). The idea of NSL-BLRL is to warmup the cache hierarchy using a hardware state checkpoint that stores a truncated NSL stream. The NSL stream is a leastrecently used stream of (unique) memory references in the pre-sample. This NSL stream is then truncated to form the NSL-BLRL warmup checkpoint; this is done by inspecting the sample for determining how far in the pre-sample one needs to go back to accurately warmup the hardware state for the given sample. We show using SPEC CPU2000 benchmarks that NSL-BLRL is (i) nearly as accurate as BLRL and NSL for sampled processor simulation, (ii) yields simulation time speedups of several orders of magnitude compared to BLRL, and (iii) is more space-efficient than NSL. As such, we conclude that NSL-BLRL is a highly efficient and accurate cache warmup strategy for sampled processor simulation.</description><identifier>ISSN: 1080-241X</identifier><identifier>ISBN: 0769525598</identifier><identifier>ISBN: 9780769525594</identifier><language>eng</language><ispartof>Annual Simulation Symposium: Proceedings of the 39th annual Symposium on Simulation; 02-06 Apr. 2006, 2006, p.168-177</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>309,310,780,784,789,790</link.rule.ids></links><search><creatorcontrib>Van Ertvelde, Luk</creatorcontrib><creatorcontrib>Hellebaut, Filip</creatorcontrib><creatorcontrib>Eeckhout, Lieven</creatorcontrib><creatorcontrib>De Bosschere, Koen</creatorcontrib><title>NSL-BLRL:Efficient CacheWarmup for Sampled Processor Simulation</title><title>Annual Simulation Symposium: Proceedings of the 39th annual Symposium on Simulation; 02-06 Apr. 2006</title><description>Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation which selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at the beginning of each sample. This is well known in the literature as the cold-start problem. The hardware structures that suffer the most from the cold-start problem are cache hierarchies. In this paper we propose NSL-BLRL which combines two previously proposed cache hierarchy warmup approaches, namely No-State-Loss (NSL) and Boundary Line Reuse Latency (BLRL). The idea of NSL-BLRL is to warmup the cache hierarchy using a hardware state checkpoint that stores a truncated NSL stream. The NSL stream is a leastrecently used stream of (unique) memory references in the pre-sample. This NSL stream is then truncated to form the NSL-BLRL warmup checkpoint; this is done by inspecting the sample for determining how far in the pre-sample one needs to go back to accurately warmup the hardware state for the given sample. We show using SPEC CPU2000 benchmarks that NSL-BLRL is (i) nearly as accurate as BLRL and NSL for sampled processor simulation, (ii) yields simulation time speedups of several orders of magnitude compared to BLRL, and (iii) is more space-efficient than NSL. As such, we conclude that NSL-BLRL is a highly efficient and accurate cache warmup strategy for sampled processor simulation.</description><issn>1080-241X</issn><isbn>0769525598</isbn><isbn>9780769525594</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2006</creationdate><recordtype>conference_proceeding</recordtype><recordid>eNqNyrsKwjAYQOGACtbLO2RyK6SXaOIiWCoORcQKupUQE4wkTe3fvL8IPoDTgY8zQjOyWXOaUsrZGEUJYSRO8-Q-RTOAFyEpyUkeod2pruJ9dam2pdZGGtUOuBDyqW6id6HD2ve4Fq6z6oHPvZcK4CvGBSsG49sFmmhhQS1_naPVobwWx7jr_TsoGBpnQCprRat8gCZLMkY5Z9nf4wdRHz0A</recordid><startdate>20060402</startdate><enddate>20060402</enddate><creator>Van Ertvelde, Luk</creator><creator>Hellebaut, Filip</creator><creator>Eeckhout, Lieven</creator><creator>De Bosschere, Koen</creator><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>20060402</creationdate><title>NSL-BLRL:Efficient CacheWarmup for Sampled Processor Simulation</title><author>Van Ertvelde, Luk ; Hellebaut, Filip ; Eeckhout, Lieven ; De Bosschere, Koen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_miscellaneous_313859983</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2006</creationdate><toplevel>online_resources</toplevel><creatorcontrib>Van Ertvelde, Luk</creatorcontrib><creatorcontrib>Hellebaut, Filip</creatorcontrib><creatorcontrib>Eeckhout, Lieven</creatorcontrib><creatorcontrib>De Bosschere, Koen</creatorcontrib><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Van Ertvelde, Luk</au><au>Hellebaut, Filip</au><au>Eeckhout, Lieven</au><au>De Bosschere, Koen</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>NSL-BLRL:Efficient CacheWarmup for Sampled Processor Simulation</atitle><btitle>Annual Simulation Symposium: Proceedings of the 39th annual Symposium on Simulation; 02-06 Apr. 2006</btitle><date>2006-04-02</date><risdate>2006</risdate><spage>168</spage><epage>177</epage><pages>168-177</pages><issn>1080-241X</issn><isbn>0769525598</isbn><isbn>9780769525594</isbn><abstract>Architectural simulation is extremely time-consuming given the huge number of instructions that need to be simulated for contemporary benchmarks. Sampled simulation which selects a number of samples from the complete benchmark execution yields substantial speedups. However, there is one major issue that needs to be dealt with in order to minimize non-sampling bias, namely the hardware state at the beginning of each sample. This is well known in the literature as the cold-start problem. The hardware structures that suffer the most from the cold-start problem are cache hierarchies. In this paper we propose NSL-BLRL which combines two previously proposed cache hierarchy warmup approaches, namely No-State-Loss (NSL) and Boundary Line Reuse Latency (BLRL). The idea of NSL-BLRL is to warmup the cache hierarchy using a hardware state checkpoint that stores a truncated NSL stream. The NSL stream is a leastrecently used stream of (unique) memory references in the pre-sample. This NSL stream is then truncated to form the NSL-BLRL warmup checkpoint; this is done by inspecting the sample for determining how far in the pre-sample one needs to go back to accurately warmup the hardware state for the given sample. We show using SPEC CPU2000 benchmarks that NSL-BLRL is (i) nearly as accurate as BLRL and NSL for sampled processor simulation, (ii) yields simulation time speedups of several orders of magnitude compared to BLRL, and (iii) is more space-efficient than NSL. As such, we conclude that NSL-BLRL is a highly efficient and accurate cache warmup strategy for sampled processor simulation.</abstract></addata></record>
fulltext fulltext
identifier ISSN: 1080-241X
ispartof Annual Simulation Symposium: Proceedings of the 39th annual Symposium on Simulation; 02-06 Apr. 2006, 2006, p.168-177
issn 1080-241X
language eng
recordid cdi_proquest_miscellaneous_31385998
source IEEE Xplore All Conference Series; IEEE Electronic Library (IEL) Conference Proceedings
title NSL-BLRL:Efficient CacheWarmup for Sampled Processor Simulation
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T07%3A06%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=NSL-BLRL:Efficient%20CacheWarmup%20for%20Sampled%20Processor%20Simulation&rft.btitle=Annual%20Simulation%20Symposium:%20Proceedings%20of%20the%2039th%20annual%20Symposium%20on%20Simulation;%2002-06%20Apr.%202006&rft.au=Van%20Ertvelde,%20Luk&rft.date=2006-04-02&rft.spage=168&rft.epage=177&rft.pages=168-177&rft.issn=1080-241X&rft.isbn=0769525598&rft.isbn_list=9780769525594&rft_id=info:doi/&rft_dat=%3Cproquest%3E31385998%3C/proquest%3E%3Cgrp_id%3Ecdi_FETCH-proquest_miscellaneous_313859983%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=31385998&rft_id=info:pmid/&rfr_iscdi=true