Loading…

DR DRAM: Accelerating Memory-Read-Intensive Applications

Today, many data analytic workloads such as graph processing and neural network desire efficient memory read operation. The need for preprocessing various raw data also demands enhanced memory read bandwidth. Unfortunately, due to the necessity of dynamic refresh, modern DRAM system has to stall mem...

Full description

Saved in:
Bibliographic Details
Main Authors: Cao, Yuhai, Li, Chao, Chen, Quan, Leng, Jingwen, Guo, Minyi, Wang, Jing, Zhang, Weigong
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Today, many data analytic workloads such as graph processing and neural network desire efficient memory read operation. The need for preprocessing various raw data also demands enhanced memory read bandwidth. Unfortunately, due to the necessity of dynamic refresh, modern DRAM system has to stall memory access during each refresh cycle. As DRAM device density continues to grow, the refresh time also needs to extend to cover more memory rows. Consequently, DRAM refresh operation can be a crucial throughput bottleneck for memory read intensive (MRI) data processing tasks. To fully unleash the performance of these applications, we revisit conventional DRAM architecture and refresh mechanism. We propose DR DRAM, an application-specific memory design approach that makes a novel tradeoff between read and write performance. Simply put, DR has two layers of meaning: device refresh and data recovery. It aims at eliminating stall by enabling read and refresh operations to be done simultaneously. Unlike traditional schemes, DR explores device refresh that only refreshes a specific device at a time. Meanwhile, DR increases read efficiency by recovering the inaccessible data that resides on a device under refreshing. Our design can be implemented on existing redundant data storage area on DRAM. In this paper we detail DR's architecture and protocol design. We evaluate it on a cycle accurate simulator. Our results show that DR can nearly eliminate refresh overhead for memory read operation and brings up to 12% extra maximum read bandwidth and 50~60% latency improvement on present DRR4 device.
ISSN:2576-6996
DOI:10.1109/ICCD.2018.00053