Loading…

SAMBA: Safe Model-Based & Active Reinforcement Learning

In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation...

Full description

Saved in:

Bibliographic Details
Published in:	arXiv.org 2020-06
Main Authors:	Cowen-Rivers, Alexander I, Palenicek, Daniel, Moens, Vincent, Abdullah, Mohammed, Sootla, Aivar, Wang, Jun, Ammar, Haitham
Format:	Article
Language:	English
Subjects:	Algorithms Gaussian process Information theory Learning Multiple objective analysis Statistical methods
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	In this paper, we propose SAMBA, a novel framework for safe reinforcement learning that combines aspects from probabilistic modelling, information theory, and statistics. Our method builds upon PILCO to enable active exploration using novel(semi-)metrics for out-of-sample Gaussian process evaluation optimised through a multi-objective problem that supports conditional-value-at-risk constraints. We evaluate our algorithm on a variety of safe dynamical system benchmarks involving both low and high-dimensional state representations. Our results show orders of magnitude reductions in samples and violations compared to state-of-the-art methods. Lastly, we provide intuition as to the effectiveness of the framework by a detailed analysis of our active metrics and safety constraints.
ISSN:	2331-8422