Loading…

Byzantine-Resilient Decentralized Policy Evaluation With Linear Function Approximation

In this paper, we consider the policy evaluation problem in reinforcement learning with agents on a decentralized and directed network. In order to evaluate the quality of a fixed policy in this decentralized setting, one option is for agents to run decentralized temporal-difference (TD) collaborati...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on signal processing 2021, Vol.69, p.3839-3853
Main Authors: Wu, Zhaoxian, Shen, Han, Chen, Tianyi, Ling, Qing
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:In this paper, we consider the policy evaluation problem in reinforcement learning with agents on a decentralized and directed network. In order to evaluate the quality of a fixed policy in this decentralized setting, one option is for agents to run decentralized temporal-difference (TD) collaboratively. To account for the practical scenarios where the state and action spaces are large and malicious attacks emerge, we focus on the decentralized TD learning with linear function approximation in the presence of malicious agents (often termed as Byzantine agents). We propose a trimmed mean-based Byzantine-resilient decentralized TD algorithm to perform policy evaluation in this setting. We establish the finite-time convergence rate, as well as the asymptotic learning error that depends on the number of Byzantine agents. Numerical experiments corroborate the robustness of the proposed algorithm.
ISSN:1053-587X
1941-0476
DOI:10.1109/TSP.2021.3090952