Loading…

Deep Reinforcement Learning-Based Network Routing Technology for Data Recovery in Exa-Scale Cloud Distributed Clustering Systems

Research has been conducted to efficiently transfer blocks and reduce network costs when decoding and recovering data from an erasure coding-based distributed file system. Technologies using software-defined network (SDN) controllers can collect and more efficiently manage network data. However, the...

Full description

Saved in:
Bibliographic Details
Published in:Applied sciences 2021-09, Vol.11 (18), p.8727
Main Authors: Shin, Dong-Jin, Kim, Jeong-Joon
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Research has been conducted to efficiently transfer blocks and reduce network costs when decoding and recovering data from an erasure coding-based distributed file system. Technologies using software-defined network (SDN) controllers can collect and more efficiently manage network data. However, the bandwidth depends dynamically on the number of data transmitted on the network, and the data transfer time is inefficient owing to the longer latency of existing routing paths when nodes and switches fail. We propose deep Q-network erasure coding (DQN-EC) to solve routing problems by converging erasure coding with DQN to learn dynamically changing network elements. Using the SDN controller, DQN-EC collects the status, number, and block size of nodes possessing stored blocks during erasure coding. The fat-tree network topology used for experimental evaluation collects elements of typical network packets, the bandwidth of the nodes and switches, and other information. The data collected undergo deep reinforcement learning to avoid node and switch failures and provide optimized routing paths by selecting switches that efficiently conduct block transfers. DQN-EC achieves a 2.5-times-faster block transmission time and 0.4-times-higher network throughput than open shortest path first (OSPF) routing algorithms. The bottleneck bandwidth and transmission link cost can be reduced, improving the recovery time approximately twofold.
ISSN:2076-3417
2076-3417
DOI:10.3390/app11188727