Loading…

Memory-aware Thread and Data Mapping for Hierarchical Multi-core Platforms

In parallel programs, the threads of a given application must cooperate in order to accomplish the required computation. However, the communication time between the tasks may be different depending on which core they are executing and how the memory hierarchy and interconnection are used. The proble...

Full description

Saved in:
Bibliographic Details
Published in:International Journal of Networking and Computing 2012, Vol.2(1), pp.97-116
Main Authors: Eduardo Henrique Molina da Cruz, Alves, Marco Antonio Zanata, Carissimi, Alexandre, Navaux, Philippe Olivier Alexandre, Ribeiro, Christiane Pousa, Méhaut, Jean-François
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In parallel programs, the threads of a given application must cooperate in order to accomplish the required computation. However, the communication time between the tasks may be different depending on which core they are executing and how the memory hierarchy and interconnection are used. The problem is even more important in multi-core machines with NUMA characteristics, since the remote access imposes high overhead, making them more sensitive to thread and data mapping. In this context, thread and data mapping are techniques that provide performance gains by improving the use of resources such as interconnections, main memory and cache memory. The problem of detecting the best mapping is considered NP-Hard. Furthermore, in shared memory environments, there is an additional difficulty of finding the communication pattern, which is implicit and occurs through memory accesses. Our mechanism provides static mapping on NUMA architectures which does not require any prior knowledge of the application by the programmer. To obtain the mapping, different metrics were adopted and an heuristic method based on the Edmonds matching algorithm was used. In order to evaluate our proposal, we use the NAS Parallel Benchmarks (NPB) running on two modern multi-core NUMA machines. Results show performance gains of up to 75% compared to the native Linux scheduler and memory allocator.
ISSN:2185-2839
2185-2847
DOI:10.15803/ijnc.2.1_97