Loading…

An Abstract Interface for System Software on Large-Scale Clusters

Scalable management of distributed resources is one of the major challenges when building large-scale clusters for high-performance computing. This task includes transparent fault tolerance, efficient deployment of resources and support for all the needs of parallel applications: parallel I/O, deter...

Full description

Saved in:
Bibliographic Details
Published in:Computer journal 2006-07, Vol.49 (4), p.454-469
Main Author: Fernandez, J.
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Scalable management of distributed resources is one of the major challenges when building large-scale clusters for high-performance computing. This task includes transparent fault tolerance, efficient deployment of resources and support for all the needs of parallel applications: parallel I/O, deterministic behavior and responsiveness. These challenges may seem daunting with commodity hardware and operating systems, since they were not designed to support a global, single management view of a large-scale system. In this paper we propose and demonstrate an abstract network interface in the cluster interconnect to facilitate the implementation of a simple yet powerful global operating system. This system, which can be thought of as a coarse-grain SIMD operating system, can allow commodity clusters to grow to thousands of nodes, while still retaining the usability and performance of the single-node workstation. [PUBLICATION ABSTRACT]
ISSN:0010-4620
1460-2067
0010-4620
DOI:10.1093/comjnl/bxl020