Loading…
HL-PCM: MLC PCM Main Memory with Accelerated Read
Multi-Level Cell Phase Change Memory (MLC PCM) is a promising candidate technology for DRAM replacement in main memory of modern computers. Despite of its high density and low power advantages, this technology seriously suffers from slow read and write operations. While prior works extensively studi...
Saved in:
Published in: | IEEE transactions on parallel and distributed systems 2017-11, Vol.28 (11), p.3188-3200 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Multi-Level Cell Phase Change Memory (MLC PCM) is a promising candidate technology for DRAM replacement in main memory of modern computers. Despite of its high density and low power advantages, this technology seriously suffers from slow read and write operations. While prior works extensively studied the problem of slow write, this paper targets high read latency problem in MLC PCM and introduces an architecture mechanism to overcome it. To this end, we rely on the fact that reading different bits from an MLC cell takes different latencies, i.e., for a 2-bit MLC, reading its Most-Significant Bit (MSB) is fast, while reading its Least-Significant Bits (LSBs) is slower. We then propose Half-Line PCM (HL-PCM), a novel memory architecture that leverages this non-uniformity in reading MLC PCM's content to send a requested memory block to the processor in different cycles-it sends half of a memory block to the processor ahead of the other half. If the processor requested a word belonging to the first half, it can resume its execution on receiving the first half, while the other half has not sent yet and scheduled to be received by the memory controller later. HL-PCM is easy and simple to implement, i.e., it needs minor modifications at memory controller, the search/evict policies at last level cache, as well as data layout in main memory. Our experimental results show that the proposed design improves the average memory access latency by 33-43 percent and program's execution time by 23 percent, on average, while incurring negligible overhead at memory controller and PCM DIMM, in a 16-core chip multiprocessor (CMP) running memory-intensive benchmarks. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2017.2705125 |