Loading…

Autonomic Runtime Adaptation Framework for Power Management in Large-Scale High-Performance Computing Systems

Power consumption is one of the most critical issues in the viability and sustainability of exascale computing systems. It is challenging to achieve a quintillion (10 18 ) computations per second within the sustainable power budget of 20 MW. We need to improve energy efficiency at all HPC ecosystem...

Full description

Saved in:
Bibliographic Details
Main Authors: Kumar Saurav, Sumit, Bindhumadhva Bapu, S
Format: Conference Proceeding
Language:English
Subjects:
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Power consumption is one of the most critical issues in the viability and sustainability of exascale computing systems. It is challenging to achieve a quintillion (10 18 ) computations per second within the sustainable power budget of 20 MW. We need to improve energy efficiency at all HPC ecosystem levels to achieve exascale performance, such as subsystem level, node level, system level, scheduling level, software level, HPC site level, and power supply level. The power consumption and performance are the conflicting optimization concerns and that we need to manage intelligently. Hence, we need an efficient and comprehensive power management framework. In this paper, we have presented an agent-based autonomic runtime adaptation framework (ARAF). We investigated the Q-learning algorithm for devising optimal operating point (V, F) based control policy. The experimental results show that the described control policy provides 5-12% of energy-saving with minimal performance degradation. The proposed architecture offers a comprehensive and efficient power management framework for next-generation HPC systems.
ISSN:2325-9418
DOI:10.1109/INDICON49873.2020.9342528