Loading…

BuMP: Bulk Memory Access Prediction and Streaming

With the end of Dennard scaling, server power has emerged as the limiting factor in the quest for more capable dataenters. Without the benefit of supply voltage scaling, it is essential to lower the energy per operation to improve server efficiency. As the industry moves to lean-core server processo...

Full description

Saved in:
Bibliographic Details
Main Authors: Volos, Stavros, Picorel, Javier, Falsafi, Babak, Grot, Boris
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the end of Dennard scaling, server power has emerged as the limiting factor in the quest for more capable dataenters. Without the benefit of supply voltage scaling, it is essential to lower the energy per operation to improve server efficiency. As the industry moves to lean-core server processors, the energy bottleneck is shifting toward main memory as a chief source of server energy consumption in modern dataenters. Maximizing the energy efficiency of today's DRAM chips and interfaces requires amortizing the costly DRAM page activations over multiple row buffer accesses. This work introduces Bulk Memory Access Prediction and Streaming, or BuMP. We make the observation that a significant fraction (59-79%) of all memory accesses fall into DRAM pages with high access density, meaning that the majority of their cache blocks will be accessed within a modest time frame of the first access. Accesses to high-density DRAM pages include not only memory reads in response to load instructions, but also reads stemming from store instructions as well as memory writes upon a dirty LLC eviction. The remaining accesses go to low-density pages and virtually unpredictable reference patterns (e.g., hashed key lookups). BuMP employs a low-cost predictor to identify high-density pages and triggers bulk transfer operations upon the first read or write to the page. In doing so, BuMP enforces high row buffer locality where it is profitable, thereby reducing DRAM energy per access by 23%, and improves server throughput by 11% across a wide range of server applications.
ISSN:1072-4451
DOI:10.1109/MICRO.2014.44