Loading…

Criticality-aware priority to accelerate GPU memory access

Graphic processing units (GPU) concept, combined with CUDA and OpenCL programming models, offers new opportunities to reduce latency and power consumption of throughput-oriented workloads. GPU can execute thousands of parallel threads to hide the memory access latency. However, for some memory-inten...

Full description

Saved in:
Bibliographic Details
Published in:The Journal of supercomputing 2023, Vol.79 (1), p.188-213
Main Authors: Bitalebi, Hossein, Safaei, Farshad
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Graphic processing units (GPU) concept, combined with CUDA and OpenCL programming models, offers new opportunities to reduce latency and power consumption of throughput-oriented workloads. GPU can execute thousands of parallel threads to hide the memory access latency. However, for some memory-intensive workloads, it is very likely in some time intervals that all threads of a core will be stalled while waiting for their data to be provided by the main memory. In this research, we aim to make GPU memory access latency shorter to increase the thread activity time and to decrease core underutilization. In order to improve non-optimal time of cores, we focus on the memory buffer and the interconnection network to prioritize the packets of the cores with the greatest number of stalled threads. As a result, more critical packets will receive the higher priority in arbitration and resource allocation, so their memory requests will be handled faster, and overall cores’ stall time is reduced. 28% maximum and 12.5% average speed-up improvements among the used benchmarks, without significant effect on system area and power consumption, are reported.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-022-04657-3