Loading…

Machine Learning-Based Scaling Management for Kubernetes Edge Clusters

Kubernetes, the container orchestrator for cloud-deployed applications, offers automatic scaling for the application provider in order to meet the ever-changing intensity of processing demand. This auto-scaling feature can be customized with a parameter set, but those management parameters are stati...

Full description

Saved in:
Bibliographic Details
Published in:IEEE eTransactions on network and service management 2021-03, Vol.18 (1), p.958-972
Main Authors: Toka, Laszlo, Dobreff, Gergely, Fodor, Balazs, Sonkoly, Balazs
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Kubernetes, the container orchestrator for cloud-deployed applications, offers automatic scaling for the application provider in order to meet the ever-changing intensity of processing demand. This auto-scaling feature can be customized with a parameter set, but those management parameters are static while incoming Web request dynamics often change, not to mention the fact that scaling decisions are inherently reactive, instead of being proactive. We set the ultimate goal of making cloud-based applications' management easier and more effective. We propose a Kubernetes scaling engine that makes the auto-scaling decisions apt for handling the actual variability of incoming requests. In this engine various machine learning forecast methods compete with each other via a short-term evaluation loop in order to always give the lead to the method that suits best the actual request dynamics. We also introduce a compact management parameter for the cloud-tenant application provider to easily set their sweet spot in the resource over-provisioning vs. SLA violation trade-off. We motivate our scaling solution with analytical modeling and evaluation of the current Kubernetes behavior. The multi-forecast scaling engine and the proposed management parameter are evaluated both in simulations and with measurements on our collected Web traces to show the improved quality of fitting provisioned resources to service demand. We find that with just a few, but fundamentally different, and competing forecast methods, our auto-scaler engine, implemented in Kubernetes, results in significantly fewer lost requests with just slightly more provisioned resources compared to the default baseline.
ISSN:1932-4537
1932-4537
DOI:10.1109/TNSM.2021.3052837