Loading…
Cache performance in vector supercomputers
Traditional supercomputers use a flat multi-bank SRAM memory organization to supply high bandwidth at low latency. Most other computers use a hierarchical organization with a small SRAM cache and slower, cheaper DRAM for main memory. Such systems rely heavily on data locality for achieving optimum p...
Saved in:
Main Authors: | , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Traditional supercomputers use a flat multi-bank SRAM memory organization to supply high bandwidth at low latency. Most other computers use a hierarchical organization with a small SRAM cache and slower, cheaper DRAM for main memory. Such systems rely heavily on data locality for achieving optimum performance. This paper evaluates cache-based memory systems for vector supercomputers. We develop a simulation model for a cache-based version of the Cray Research C90 and use the NAS parallel benchmarks to provide a large scale workload. We show that while caches reduce memory traffic and improve the performance of plain DRAM memory, they still lag behind cacheless SRAM. We identify the performance bottle-necks in DRAM-based memory systems and quantify their contribution to program performance degradation. We find the data fetch strategy to be a significant parameter affecting performance, evaluate the performance of several fetch policies, and show that small fetch sizes improve performance by maximizing the use of available memory bandwidth. |
---|---|
ISSN: | 1063-9535 |
DOI: | 10.1145/602770.602815 |