Loading…

Stochastic Integrated Actor-Critic for Deep Reinforcement Learning

We propose a deep stochastic actor-critic algorithm with an integrated network architecture and fewer parameters. We address stabilization of the learning procedure via an adaptive objective to the critic's loss and a smaller learning rate for the shared parameters between the actor and the cri...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transaction on neural networks and learning systems 2024-05, Vol.35 (5), p.6654-6666
Main Authors:	Zheng, Jiaohao, Kurt, Mehmet Necip, Wang, Xiaodong
Format:	Article
Language:	English
Subjects:	Actor–critic adaptive objective Algorithms Complexity theory Control tasks Decoding Deep learning deep reinforcement learning (RL) integrated network Learning Linear programming Machine learning mixed on–off policy exploration Network architecture Parameters Policies Reinforcement sample complexity Stochasticity Task analysis Tensors Training
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We propose a deep stochastic actor-critic algorithm with an integrated network architecture and fewer parameters. We address stabilization of the learning procedure via an adaptive objective to the critic's loss and a smaller learning rate for the shared parameters between the actor and the critic. Moreover, we propose a mixed on-off policy exploration strategy to speed up learning. Experiments illustrate that our algorithm reduces the sample complexity by 50%-93% compared with the state-of-the-art deep reinforcement learning (RL) algorithms twin delayed deep deterministic policy gradient (TD3), soft actor-critic (SAC), proximal policy optimization (PPO), advantage actor-critic (A2C), and interpolated policy gradient (IPG) over continuous control tasks LunarLander, BipedalWalker, BipedalWalkerHardCore, Ant, and Minitaur in the OpenAI Gym.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2022.3212273