Loading…
With great reliability comes great responsibility: tradeoffs of run-time policy on high reliability systems
In this paper we describe a simulation study to improve performance on a large highly utilized cluster at Sandia National Laboratories. The unique characteristic about the cluster is that there are very few constraints on job size. In particular, the run-time is limited only by system times which oc...
Saved in:
Main Authors: | , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | In this paper we describe a simulation study to improve performance on a large highly utilized cluster at Sandia National Laboratories. The unique characteristic about the cluster is that there are very few constraints on job size. In particular, the run-time is limited only by system times which occur about every two weeks. The major contribution of this paper is that we quantify the difference in makespan between running a single long job and its equivalent in many shorter jobs. We find that running longer jobs is beneficial to the facility as a whole when the cycle-weighted makespans are considered and that running shorter jobs has an overall beneficial effect on the makespan for the jobs taken unweighted and for most users. |
---|---|
DOI: | 10.1109/CCGrid.2004.1336653 |