Loading…

Skill-Critic: Refining Learned Skills for Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy a...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE robotics and automation letters 2024-04, Vol.9 (4), p.3625-3632
Main Authors:	Hao, Ce, Weaver, Catherine, Tang, Chen, Kawamoto, Kenta, Tomizuka, Masayoshi, Zhan, Wei
Format:	Article
Language:	English
Subjects:	Algorithms Decoding Optimization Policies Regularization Reinforcement learning representation learning Skills Sports Task analysis Training transfer learning
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Hierarchical reinforcement learning (RL) can accelerate long-horizon decision-making by temporally abstracting a policy into multiple levels. Promising results in sparse reward environments have been seen with skills , i.e. sequences of primitive actions. Typically, a skill latent space and policy are discovered from offline data. However, the resulting low-level policy can be unreliable due to low-coverage demonstrations or distribution shifts. As a solution, we propose the Skill-Critic algorithm to fine-tune the low-level policy in conjunction with high-level skill selection. Our Skill-Critic algorithm optimizes both the low-level and high-level policies; these policies are initialized and regularized by the latent space learned from offline demonstrations to guide the parallel policy optimization. We validate Skill-Critic in multiple sparse-reward RL environments, including a new sparse-reward autonomous racing task in Gran Turismo Sport. The experiments show that Skill-Critic's low-level policy fine-tuning and demonstration-guided regularization are essential for good performance.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2024.3368231