Loading…
Reinforcement Learning for Optimizing Delay-Sensitive Task Offloading in Vehicular Edge-Cloud Computing
With the appearance of more and more devices connected to the Internet, the world has witnessed an ever-growing number of data to be processed. Among those, many tasks require swift execution time, while the storage and computation capability of Internet of Things (IoT) devices are limited. To addre...
Saved in:
Published in: | IEEE internet of things journal 2024-01, Vol.11 (2), p.2058-2069 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | With the appearance of more and more devices connected to the Internet, the world has witnessed an ever-growing number of data to be processed. Among those, many tasks require swift execution time, while the storage and computation capability of Internet of Things (IoT) devices are limited. To address the demands of delay-sensitive tasks, we present a vehicular edge-cloud computing (VECC) network that leverages powerful computation capabilities through the deployment of servers in proximity to task-generated devices, as well as the utilization of idle resources from smart vehicles to share the workload. Because these limited resources are vulnerable to sudden data arising, it is imperative to incorporate cloud servers to prevent system overload. The challenge now is to find a task offloading strategy that collaborates both edges and cloud resources to minimize the total time surpassing the quality baseline of each task (tolerance time) and make all tasks meet their soft deadlines of quality. To reach this goal, we first model the task offloading problem in VECC as a Markov decision process (MDP). Then, we propose advantage-oriented task offloading with a dueling actor-insulator network scheme to solve the problem. This value-based reinforcement learning (RL) method helps the agent find an effective policy when not knowing all the state attributes changes. The effectiveness of our method is demonstrated by performance evaluations based on real-world bus traces in Rio de Janeiro (Brazil). The experimental results show that our proposal reduces the tolerance time by at least 8.81% compared to other RL algorithms and 75% compared to greedy approaches. |
---|---|
ISSN: | 2327-4662 2327-4662 |
DOI: | 10.1109/JIOT.2023.3292591 |