Loading…

Tolerating Cache-Miss Latency with Multipass Pipelines

Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a m...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE MICRO 2006-01, Vol.26 (1), p.40-47
Main Authors:	Barnes, R.D., Ryoo, S., Hwu, W.W.
Format:	Article
Language:	English
Subjects:	Cache Computer memory Delay Dynamic scheduling Effectiveness Exploitation Flea-flicker Hardware in-order design Mathematical models memory-latency tolerance Microprocessors multipass pipelining Pipeline processing Pipelines Pipelining (computers) Processor scheduling Random access memory Registers Runtime Scheduling Sun
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c341t-ac145fb95c5cbe309517dc896639bc455ce423568b4feb673f440c620b232a923
cites	cdi_FETCH-LOGICAL-c341t-ac145fb95c5cbe309517dc896639bc455ce423568b4feb673f440c620b232a923
container_end_page	47
container_issue	1
container_start_page	40
container_title	IEEE MICRO
container_volume	26
creator	Barnes, R.D. Ryoo, S. Hwu, W.W.
description	Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a microarchitectural model that provides an alternative to out-of-order execution for tolerating memory access latencies. We call our approach "flea-flicker" multipass pipelining because it uses two (or more) passes of preexecution or execution to achieve performance efficacy. Multipass pipelining assumes compile-time scheduling for lower-power and lower-complexity exploitation of instruction-level parallelism
doi_str_mv	10.1109/MM.2006.25
format	article
fullrecord	<record><control><sourceid>proquest_ieee_</sourceid><recordid>TN_cdi_ieee_primary_1603496</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1603496</ieee_id><sourcerecordid>896189978</sourcerecordid><originalsourceid>FETCH-LOGICAL-c341t-ac145fb95c5cbe309517dc896639bc455ce423568b4feb673f440c620b232a923</originalsourceid><addsrcrecordid>eNp90EtLw0AUBeBBFKzVjVs3wYWCkDrvx1KKL2jQRV0Pk_HGTkmTOJMg_femVBBcuLpw-ThwDkLnBM8Iwea2KGYUYzmj4gBNiGEq54SzQzTBVNGcKEaP0UlKa4yxoFhPkFy2NUTXh-Yjmzu_grwIKWUL10Pjt9lX6FdZMdR96Nz4fg0d1KGBdIqOKlcnOPu5U_T2cL-cP-WLl8fn-d0i94yTPneecFGVRnjhS2DYCKLevTZSMlN6LoQHTpmQuuQVlFKxinPsJcUlZdQZyqboep_bxfZzgNTbTUge6to10A7JjlFEG6P0KK_-lVRrrIUyI7z8A9ftEJuxhSVGMqWU3KGbPfKxTSlCZbsYNi5uLcF2t7QtCrtb2lIx4os9DgDwCyVmfAz8BqxLduQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>196377769</pqid></control><display><type>article</type><title>Tolerating Cache-Miss Latency with Multipass Pipelines</title><source>IEEE Xplore (Online service)</source><creator>Barnes, R.D. ; Ryoo, S. ; Hwu, W.W.</creator><creatorcontrib>Barnes, R.D. ; Ryoo, S. ; Hwu, W.W.</creatorcontrib><description>Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a microarchitectural model that provides an alternative to out-of-order execution for tolerating memory access latencies. We call our approach "flea-flicker" multipass pipelining because it uses two (or more) passes of preexecution or execution to achieve performance efficacy. Multipass pipelining assumes compile-time scheduling for lower-power and lower-complexity exploitation of instruction-level parallelism</description><identifier>ISSN: 0272-1732</identifier><identifier>EISSN: 1937-4143</identifier><identifier>DOI: 10.1109/MM.2006.25</identifier><identifier>CODEN: IEMIDZ</identifier><language>eng</language><publisher>Los Alamitos: IEEE</publisher><subject>Cache ; Computer memory ; Delay ; Dynamic scheduling ; Effectiveness ; Exploitation ; Flea-flicker ; Hardware ; in-order design ; Mathematical models ; memory-latency tolerance ; Microprocessors ; multipass pipelining ; Pipeline processing ; Pipelines ; Pipelining (computers) ; Processor scheduling ; Random access memory ; Registers ; Runtime ; Scheduling ; Sun</subject><ispartof>IEEE MICRO, 2006-01, Vol.26 (1), p.40-47</ispartof><rights>Copyright Institute of Electrical and Electronics Engineers, Inc. (IEEE) Jan/Feb 2006</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c341t-ac145fb95c5cbe309517dc896639bc455ce423568b4feb673f440c620b232a923</citedby><cites>FETCH-LOGICAL-c341t-ac145fb95c5cbe309517dc896639bc455ce423568b4feb673f440c620b232a923</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1603496$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids></links><search><creatorcontrib>Barnes, R.D.</creatorcontrib><creatorcontrib>Ryoo, S.</creatorcontrib><creatorcontrib>Hwu, W.W.</creatorcontrib><title>Tolerating Cache-Miss Latency with Multipass Pipelines</title><title>IEEE MICRO</title><addtitle>MM</addtitle><description>Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a microarchitectural model that provides an alternative to out-of-order execution for tolerating memory access latencies. We call our approach "flea-flicker" multipass pipelining because it uses two (or more) passes of preexecution or execution to achieve performance efficacy. Multipass pipelining assumes compile-time scheduling for lower-power and lower-complexity exploitation of instruction-level parallelism</description><subject>Cache</subject><subject>Computer memory</subject><subject>Delay</subject><subject>Dynamic scheduling</subject><subject>Effectiveness</subject><subject>Exploitation</subject><subject>Flea-flicker</subject><subject>Hardware</subject><subject>in-order design</subject><subject>Mathematical models</subject><subject>memory-latency tolerance</subject><subject>Microprocessors</subject><subject>multipass pipelining</subject><subject>Pipeline processing</subject><subject>Pipelines</subject><subject>Pipelining (computers)</subject><subject>Processor scheduling</subject><subject>Random access memory</subject><subject>Registers</subject><subject>Runtime</subject><subject>Scheduling</subject><subject>Sun</subject><issn>0272-1732</issn><issn>1937-4143</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNp90EtLw0AUBeBBFKzVjVs3wYWCkDrvx1KKL2jQRV0Pk_HGTkmTOJMg_femVBBcuLpw-ThwDkLnBM8Iwea2KGYUYzmj4gBNiGEq54SzQzTBVNGcKEaP0UlKa4yxoFhPkFy2NUTXh-Yjmzu_grwIKWUL10Pjt9lX6FdZMdR96Nz4fg0d1KGBdIqOKlcnOPu5U_T2cL-cP-WLl8fn-d0i94yTPneecFGVRnjhS2DYCKLevTZSMlN6LoQHTpmQuuQVlFKxinPsJcUlZdQZyqboep_bxfZzgNTbTUge6to10A7JjlFEG6P0KK_-lVRrrIUyI7z8A9ftEJuxhSVGMqWU3KGbPfKxTSlCZbsYNi5uLcF2t7QtCrtb2lIx4os9DgDwCyVmfAz8BqxLduQ</recordid><startdate>200601</startdate><enddate>200601</enddate><creator>Barnes, R.D.</creator><creator>Ryoo, S.</creator><creator>Hwu, W.W.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>F28</scope><scope>FR3</scope></search><sort><creationdate>200601</creationdate><title>Tolerating Cache-Miss Latency with Multipass Pipelines</title><author>Barnes, R.D. ; Ryoo, S. ; Hwu, W.W.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c341t-ac145fb95c5cbe309517dc896639bc455ce423568b4feb673f440c620b232a923</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Cache</topic><topic>Computer memory</topic><topic>Delay</topic><topic>Dynamic scheduling</topic><topic>Effectiveness</topic><topic>Exploitation</topic><topic>Flea-flicker</topic><topic>Hardware</topic><topic>in-order design</topic><topic>Mathematical models</topic><topic>memory-latency tolerance</topic><topic>Microprocessors</topic><topic>multipass pipelining</topic><topic>Pipeline processing</topic><topic>Pipelines</topic><topic>Pipelining (computers)</topic><topic>Processor scheduling</topic><topic>Random access memory</topic><topic>Registers</topic><topic>Runtime</topic><topic>Scheduling</topic><topic>Sun</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Barnes, R.D.</creatorcontrib><creatorcontrib>Ryoo, S.</creatorcontrib><creatorcontrib>Hwu, W.W.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><jtitle>IEEE MICRO</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Barnes, R.D.</au><au>Ryoo, S.</au><au>Hwu, W.W.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Tolerating Cache-Miss Latency with Multipass Pipelines</atitle><jtitle>IEEE MICRO</jtitle><stitle>MM</stitle><date>2006-01</date><risdate>2006</risdate><volume>26</volume><issue>1</issue><spage>40</spage><epage>47</epage><pages>40-47</pages><issn>0272-1732</issn><eissn>1937-4143</eissn><coden>IEMIDZ</coden><abstract>Microprocessors exploit instruction-level parallelism and tolerate memory-access latencies to achieve high-performance. Out-of-order microprocessors do this by dynamically scheduling instruction execution, but require power-hungry hardware structures. This article describes multipass pipelining, a microarchitectural model that provides an alternative to out-of-order execution for tolerating memory access latencies. We call our approach "flea-flicker" multipass pipelining because it uses two (or more) passes of preexecution or execution to achieve performance efficacy. Multipass pipelining assumes compile-time scheduling for lower-power and lower-complexity exploitation of instruction-level parallelism</abstract><cop>Los Alamitos</cop><pub>IEEE</pub><doi>10.1109/MM.2006.25</doi><tpages>8</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0272-1732
ispartof	IEEE MICRO, 2006-01, Vol.26 (1), p.40-47
issn	0272-1732 1937-4143
language	eng
recordid	cdi_ieee_primary_1603496
source	IEEE Xplore (Online service)
subjects	Cache Computer memory Delay Dynamic scheduling Effectiveness Exploitation Flea-flicker Hardware in-order design Mathematical models memory-latency tolerance Microprocessors multipass pipelining Pipeline processing Pipelines Pipelining (computers) Processor scheduling Random access memory Registers Runtime Scheduling Sun
title	Tolerating Cache-Miss Latency with Multipass Pipelines
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T08%3A10%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_ieee_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Tolerating%20Cache-Miss%20Latency%20with%20Multipass%20Pipelines&rft.jtitle=IEEE%20MICRO&rft.au=Barnes,%20R.D.&rft.date=2006-01&rft.volume=26&rft.issue=1&rft.spage=40&rft.epage=47&rft.pages=40-47&rft.issn=0272-1732&rft.eissn=1937-4143&rft.coden=IEMIDZ&rft_id=info:doi/10.1109/MM.2006.25&rft_dat=%3Cproquest_ieee_%3E896189978%3C/proquest_ieee_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c341t-ac145fb95c5cbe309517dc896639bc455ce423568b4feb673f440c620b232a923%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=196377769&rft_id=info:pmid/&rft_ieee_id=1603496&rfr_iscdi=true