Loading…

Compiler and runtime support for irregular reductions on amultithreaded architecture

Computations from many scientific and engineering domains use irregular meshes and/or sparse matrices. The codes expressing these computations involve irregular reductions. Irregular reductions pose many challenges to parallel architectures and their compilers in terms of parallelization, locality m...

Full description

Saved in:
Bibliographic Details
Main Authors: Zoppetti, G. M., Agrawal, G., Kumar, R.
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Computations from many scientific and engineering domains use irregular meshes and/or sparse matrices. The codes expressing these computations involve irregular reductions. Irregular reductions pose many challenges to parallel architectures and their compilers in terms of parallelization, locality management, and communication optimization. Multithreaded architectures offer rich support for local synchronization, overlapping of communication and computation, and low-overhead communication and thread switching. Therefore, they appear to be promising for scalable parallelization of irregular reductions. This paper presents an execution model and a compilation strategy for supporting irregular reductions on a fine-grained multithreaded architecture. The key aspect of this strategy is that the frequency and volume of communication is independent of the contents of the indirection arrays. The performance obtained depends upon the architecture's ability to overlap communication and computation and is largely independent of the partitioning of the problem. We present experimental results from compiling three scientific kernels involving irregular reductions (mvm, euler, and moldyn) for execution on the EARTH fine-grained multithreaded architecture. On mvm, which does not involve any left-hand-side irregular accesses, we achieve near linear absolute speedups. For euler and moldyn, which do involve left-hand-side irregular accesses, our strategy initially incurs some overheads, but the relative speedups are very good. In going from 2 to 32 processors, the relative speedups for euler were 9.28 and 10.36 on its two datasets, while the speedups for moldyn were 9.70 and 10.76 on its two datasets.
DOI:10.1109/IPDPS.2002.1015490