Loading…
O2OAT: Efficient Offline-to-Online Reinforcement Learning with Adaptive Transition Strategy
Offline-to-online reinforcement learning (020 RL) provides a promising paradigm for pre-training reinforcement learning (RL) policies using limited offline datasets and fine-tuning them online for real-world deployment. Currently, pre-vailing methods isolate offline policy learning from online ex-pl...
Saved in:
Main Authors: | , , , , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Offline-to-online reinforcement learning (020 RL) provides a promising paradigm for pre-training reinforcement learning (RL) policies using limited offline datasets and fine-tuning them online for real-world deployment. Currently, pre-vailing methods isolate offline policy learning from online ex-ploration, leading to a notable performance drop during the transition. To tackle the aforementioned obstacles, we introduce a novel and effective algorithm, titled "Offline-to-Online rein-forcement learning with Adaptive Transition strategy" (020AT). This approach incorporates an innovative adaptive weighting mechanism that meticulously modulates the influence of behavior cloning (BC) regularization terms during policy updates. Experi-mental evaluations conducted on the D4RL benchmark showcase the remarkable ability of our proposed algorithm to effortlessly bridge the gap between offline and online learning phases. This seamless transition is accompanied by a pronounced sample efficiency during online adaptation, resulting in a substantial performance boost over current state-of-the-art methods. |
---|---|
ISSN: | 2771-6902 |
DOI: | 10.1109/BigDIA63733.2024.10809011 |