Loading…
Taskgraph: A Low Contention OpenMP Tasking Framework
OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a tasking model that offers a high-level of abstraction to effectively exploit structured (loop-based) and highly dynamic unstructured (task-based) parallelism in an easy and flexible way. Unfo...
Saved in:
Published in: | IEEE transactions on parallel and distributed systems 2023-08, Vol.34 (8), p.1-12 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | OpenMP is the de-facto standard for shared memory systems in High-Performance Computing (HPC). It includes a tasking model that offers a high-level of abstraction to effectively exploit structured (loop-based) and highly dynamic unstructured (task-based) parallelism in an easy and flexible way. Unfortunately, the run-time overheads introduced to manage tasks are (very) high in most common OpenMP frameworks (e.g., GCC, LLVM), which defeats the potential benefits of the tasking model, and makes it suitable for coarse-grained tasks only. This paper presents taskgraph , a framework that uses a task dependency graph (TDG) to represent a region of code implemented with OpenMP tasks in order to reduce the run-time overheads associated with the management of tasks, i.e., contention and parallel orchestration, including task creation and synchronization. The TDG avoids the overheads related to the resolution of task dependencies and greatly reduces those deriving from accesses to shared resources. Moreover, the taskgraph framework introduces in OpenMP the record-and-replay execution model that accelerates the taskgraph region from its second execution. Overall, the multiple optimizations presented in this paper allow exploiting fine-grained OpenMP tasks to cope with the trend in current applications pointing to leverage massive on-node parallelism, fine-grained and dynamic scheduling paradigms. The framework is implemented on LLVM 15.0. Results show that the taskgraph implementation outperforms the vanilla OpenMP system in terms of performance and scalability, for all structured and unstructured parallelism, and considering coarse and fine grained tasks. Furthermore, the proposed framework makes the tasking model a competitive alternative to the OpenMP thread model in most cases. |
---|---|
ISSN: | 1045-9219 1558-2183 |
DOI: | 10.1109/TPDS.2023.3284219 |