Loading…
Criticality-aware priority to accelerate GPU memory access
Graphic processing units (GPU) concept, combined with CUDA and OpenCL programming models, offers new opportunities to reduce latency and power consumption of throughput-oriented workloads. GPU can execute thousands of parallel threads to hide the memory access latency. However, for some memory-inten...
Saved in:
Published in: | The Journal of supercomputing 2023, Vol.79 (1), p.188-213 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Graphic processing units (GPU) concept, combined with CUDA and OpenCL programming models, offers new opportunities to reduce latency and power consumption of throughput-oriented workloads. GPU can execute thousands of parallel threads to hide the memory access latency. However, for some memory-intensive workloads, it is very likely in some time intervals that all threads of a core will be stalled while waiting for their data to be provided by the main memory. In this research, we aim to make GPU memory access latency shorter to increase the thread activity time and to decrease core underutilization. In order to improve non-optimal time of cores, we focus on the memory buffer and the interconnection network to prioritize the packets of the cores with the greatest number of stalled threads. As a result, more critical packets will receive the higher priority in arbitration and resource allocation, so their memory requests will be handled faster, and overall cores’ stall time is reduced. 28% maximum and 12.5% average speed-up improvements among the used benchmarks, without significant effect on system area and power consumption, are reported. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-022-04657-3 |