Loading…

iRAF: A Deep Reinforcement Learning Approach for Collaborative Mobile Edge Computing IoT Networks

Recently, as the development of artificial intelligence (AI), data-driven AI methods have shown amazing performance in solving complex problems to support the Internet of Things (IoT) world with massive resource-consuming and delay-sensitive services. In this paper, we propose an intelligent resourc...

Full description

Saved in:
Bibliographic Details
Published in:IEEE internet of things journal 2019-08, Vol.6 (4), p.7011-7024
Main Authors: Chen, Jienan, Chen, Siyu, Wang, Qi, Cao, Bin, Feng, Gang, Hu, Jianhao
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, as the development of artificial intelligence (AI), data-driven AI methods have shown amazing performance in solving complex problems to support the Internet of Things (IoT) world with massive resource-consuming and delay-sensitive services. In this paper, we propose an intelligent resource allocation framework (iRAF) to solve the complex resource allocation problem for the collaborative mobile edge computing (CoMEC) network. The core of iRAF is a multitask deep reinforcement learning algorithm for making resource allocation decisions based on network states and task characteristics, such as the computing capability of edge servers and devices, communication channel quality, resource utilization, and latency requirement of the services, etc. The proposed iRAF can automatically learn the network environment and generate resource allocation decision to maximize the performance over latency and power consumption with self-play training. iRAF becomes its own teacher: a deep neural network (DNN) is trained to predict iRAF's resource allocation action in a self-supervised learning manner, where the training data is generated from the searching process of Monte Carlo tree search (MCTS) algorithm. A major advantage of MCTS is that it will simulate trajectories into the future, starting from a root state, to obtain a best action by evaluating the reward value. Numerical results show that our proposed iRAF achieves 59.27% and 51.71% improvement on service latency performance compared with the greedy-search and the deep Q -learning-based methods, respectively.
ISSN:2327-4662
2327-4662
DOI:10.1109/JIOT.2019.2913162