Loading…

An Online Algorithm for Inference Service Scheduling Using Combinations of Server-Based and Serverless Instances in Cloud Environments

With the continuous development in the field of machine learning, there is an increasing demand for cloud-based machine learning inference services, which are latency-sensitive tasks such as the service requests from the IoT (Internet of Things) devices. These inference services are generally accomp...

Full description

Saved in:
Bibliographic Details
Published in:IEEE internet of things journal 2024-12, p.1-1
Main Authors: Liu, Siyuan, Pan, Li, Liu, Shijun, Qi, Kaiyuan
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With the continuous development in the field of machine learning, there is an increasing demand for cloud-based machine learning inference services, which are latency-sensitive tasks such as the service requests from the IoT (Internet of Things) devices. These inference services are generally accompanied by fluctuations and uncertainties, so they often require vastly varied numbers of servers at different time spots. As a result, how to dynamically and rationally schedule cloud servers for inference services has become an important issue. Alibaba Cloud currently provides its serverless instances called Elastic Container Instance (ECI), and due to the advantages of pay-as-you-go billing and second-level elasticity, they are well-suited for handling bursty or fluctuating workloads. At the same time, Alibaba Cloud's subscription-based Elastic Compute Service (ECS) instances can be used for steady-state workloads. Our objective is to dynamically combine these two types of instances to deal with inference service requests. In this paper, we propose a deterministic online algorithm that can rationally schedule these two types of instances to optimize costs without requiring knowledge of future workloads. We prove that the proposed online algorithm achieves a competitive ratio of no more than 2 compared to the optimal offline algorithm. Through simulation experiments, we demonstrate that our algorithm outperforms three benchmarks, which are All-reserved algorithm, All-on-demand algorithm, and traditional online algorithms that only use ECS instances. Our algorithm exhibits superiority across various workloads and can significantly reduces costs in most cases.
ISSN:2327-4662
DOI:10.1109/JIOT.2024.3515208