Loading…

Reparallelization techniques for migrating OpenMP codes in computational grids

Typical computational grid users target only a single cluster and have to estimate the runtime of their jobs. Job schedulers prefer short‐running jobs to maintain a high system utilization. If the user underestimates the runtime, premature termination causes computation loss; overestimation is penal...

Full description

Saved in:
Bibliographic Details
Published in:Concurrency and computation 2009-03, Vol.21 (3), p.281-299
Main Authors: Klemm, Michael, Bezold, Matthias, Gabriel, Stefan, Veldema, Ronald, Philippsen, Michael
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Typical computational grid users target only a single cluster and have to estimate the runtime of their jobs. Job schedulers prefer short‐running jobs to maintain a high system utilization. If the user underestimates the runtime, premature termination causes computation loss; overestimation is penalized by long queue times. As a solution, we present an automatic reparallelization and migration of OpenMP applications. A reparallelization is dynamically computed for an OpenMP work distribution when the number of CPUs changes. The application can be migrated between clusters when an allocated time slice is exceeded. Migration is based on a coordinated, heterogeneous checkpointing algorithm. Both reparallelization and migration enable the user to freely use computing time at more than a single point of the grid. Our demo applications successfully adapt to the changed CPU setting and smoothly migrate between, for example, clusters in Erlangen, Germany, and Amsterdam, the Netherlands, that use different kinds and numbers of processors. Benchmarks show that reparallelization and migration impose average overheads of about 4 and 2%, respectively. Copyright © 2008 John Wiley & Sons, Ltd.
ISSN:1532-0626
1532-0634
DOI:10.1002/cpe.1356