Loading…

O2OAT: Efficient Offline-to-Online Reinforcement Learning with Adaptive Transition Strategy

Offline-to-online reinforcement learning (020 RL) provides a promising paradigm for pre-training reinforcement learning (RL) policies using limited offline datasets and fine-tuning them online for real-world deployment. Currently, pre-vailing methods isolate offline policy learning from online ex-pl...

Full description

Saved in:

Bibliographic Details
Main Authors:	Shi, Wei, Huang, Honglan, Liang, Xingxing, Zhang, Longfei, Yang, Fangjie, Cheng, Guangquan, Huang, Jincai, Liu, Zhong, Xu, Dan
Format:	Conference Proceeding
Language:	English
Subjects:	adap-tive weighting factor Benchmark testing Big Data Cloning offline-to-online reinforcement learning policy collapse Reinforcement learning Stability analysis Training
Online Access:	Request full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Offline-to-online reinforcement learning (020 RL) provides a promising paradigm for pre-training reinforcement learning (RL) policies using limited offline datasets and fine-tuning them online for real-world deployment. Currently, pre-vailing methods isolate offline policy learning from online ex-ploration, leading to a notable performance drop during the transition. To tackle the aforementioned obstacles, we introduce a novel and effective algorithm, titled "Offline-to-Online rein-forcement learning with Adaptive Transition strategy" (020AT). This approach incorporates an innovative adaptive weighting mechanism that meticulously modulates the influence of behavior cloning (BC) regularization terms during policy updates. Experi-mental evaluations conducted on the D4RL benchmark showcase the remarkable ability of our proposed algorithm to effortlessly bridge the gap between offline and online learning phases. This seamless transition is accompanied by a pronounced sample efficiency during online adaptation, resulting in a substantial performance boost over current state-of-the-art methods.
ISSN:	2771-6902
DOI:	10.1109/BigDIA63733.2024.10809011