Loading…

Directive-Based Pipelining Extension for OpenMP

Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap...

Full description

Saved in:
Bibliographic Details
Main Authors: Cui, Xuewen, Scogland, Thomas R.W., De Supinski, Bronis R., Feng, Wu-Chun
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension for OpenMP to overlap data transfers and kernel computation. This extension can map data to a pre-allocated device buffer and can automate memory-constrained array indexing and sub-task scheduling. We evaluate a prototype implementation of our approach with three different applications. The experimental results show that our approach can reduce memory usage by 52% to 97% while delivering a 1:41X to 1:65X speedup over the naive offload model.
ISSN:2168-9253
DOI:10.1109/CLUSTER.2016.53