Loading…

Directive-Based Pipelining Extension for OpenMP

Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap...

Full description

Saved in:

Bibliographic Details
Main Authors:	Cui, Xuewen, Scogland, Thomas R.W., De Supinski, Bronis R., Feng, Wu-Chun
Format:	Conference Proceeding
Language:	English
Subjects:	Arrays Benchmark testing Computational modeling Data transfer Graphics processing units Kernel Memory management
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Programming models like CUDA, OpenMP, OpenACC and OpenCL are designed to offload compute-intensive workloads to accelerators efficiently. However, the naive offload model, which synchronously copies and executes in sequence, requires extensive hand-tuning of techniques, such as pipelining to overlap computation and communication. Therefore, we propose an easy-to-use, directive-based pipelining extension for OpenMP to overlap data transfers and kernel computation. This extension can map data to a pre-allocated device buffer and can automate memory-constrained array indexing and sub-task scheduling. We evaluate a prototype implementation of our approach with three different applications. The experimental results show that our approach can reduce memory usage by 52% to 97% while delivering a 1:41X to 1:65X speedup over the naive offload model.
ISSN:	2168-9253
DOI:	10.1109/CLUSTER.2016.53