Loading…
Autonomic Runtime Adaptation Framework for Power Management in Large-Scale High-Performance Computing Systems
Power consumption is one of the most critical issues in the viability and sustainability of exascale computing systems. It is challenging to achieve a quintillion (10 18 ) computations per second within the sustainable power budget of 20 MW. We need to improve energy efficiency at all HPC ecosystem...
Saved in:
Main Authors: | , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Power consumption is one of the most critical issues in the viability and sustainability of exascale computing systems. It is challenging to achieve a quintillion (10 18 ) computations per second within the sustainable power budget of 20 MW. We need to improve energy efficiency at all HPC ecosystem levels to achieve exascale performance, such as subsystem level, node level, system level, scheduling level, software level, HPC site level, and power supply level. The power consumption and performance are the conflicting optimization concerns and that we need to manage intelligently. Hence, we need an efficient and comprehensive power management framework. In this paper, we have presented an agent-based autonomic runtime adaptation framework (ARAF). We investigated the Q-learning algorithm for devising optimal operating point (V, F) based control policy. The experimental results show that the described control policy provides 5-12% of energy-saving with minimal performance degradation. The proposed architecture offers a comprehensive and efficient power management framework for next-generation HPC systems. |
---|---|
ISSN: | 2325-9418 |
DOI: | 10.1109/INDICON49873.2020.9342528 |