Loading…

Learning by reusing previous advice: a memory-based teacher–student framework

Reinforcement Learning (RL) has been widely used to solve sequential decision-making problems. However, it often suffers from slow learning speed in complex scenarios. Teacher–student frameworks address this issue by enabling agents to ask for and give advice so that a student agent can leverage the...

Full description

Saved in:

Bibliographic Details
Published in:	Autonomous agents and multi-agent systems 2023-06, Vol.37 (1), Article 14
Main Authors:	Zhu, Changxi, Cai, Yi, Hu, Shuyue, Leung, Ho-fung, Chiu, Dickson K. W.
Format:	Article
Language:	English
Subjects:	Artificial Intelligence Computer Science Computer Systems Organization and Communication Networks Decision making Learning Reuse Software Engineering/Programming and Operating Systems Teachers User Interfaces and Human Computer Interaction
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Reinforcement Learning (RL) has been widely used to solve sequential decision-making problems. However, it often suffers from slow learning speed in complex scenarios. Teacher–student frameworks address this issue by enabling agents to ask for and give advice so that a student agent can leverage the knowledge of a teacher agent to facilitate its learning. In this paper, we consider the effect of reusing previous advice, and propose a novel memory-based teacher–student framework such that student agents can memorize and reuse the previous advice from teacher agents. In particular, we propose two methods to decide whether previous advice should be reused: Q-Change per Step that reuses the advice if it leads to an increase in Q-values, and Decay Reusing Probability that reuses the advice with a decaying probability. The experiments on diverse RL tasks (Mario, Predator–Prey and Half Field Offense) confirm that our proposed framework significantly outperforms the existing frameworks in which previous advice is not reused.
ISSN:	1387-2532 1573-7454
DOI:	10.1007/s10458-022-09595-1