Loading…
Learning Humanoid Robot Running Motions with Symmetry Incentive through Proximal Policy Optimization
This article contributes with a methodology based on deep reinforcement learning to develop running skills in a humanoid robot with no prior knowledge. Specifically, the algorithm used for learning is the Proximal Policy Optimization (PPO). The chosen application domain is the RoboCup 3D Soccer Simu...
Saved in:
Published in: | Journal of intelligent & robotic systems 2021-07, Vol.102 (3), Article 54 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This article contributes with a methodology based on deep reinforcement learning to develop running skills in a humanoid robot with no prior knowledge. Specifically, the algorithm used for learning is the Proximal Policy Optimization (PPO). The chosen application domain is the RoboCup 3D Soccer Simulation (Soccer 3D), a competition where teams composed by 11 autonomous agents each compete in simulated soccer matches. In our approach, the state vector used as the neural network’s input consists of raw sensor measurements or quantities which could be obtained through sensor fusion, while the actions are the joint positions, which are sent to joint controllers. Our running behavior outperforms the state-of-the-art in terms of sprint speed by approximately 50%. We present results regarding the training procedure and also evaluate the controllers in terms of speed, reliability, and human similarity. Since the running policies with top speed display asymmetric motions, we also investigate a technique to encourage symmetry in the sagittal plane. Finally, we discuss key factors that lead us to surpass previous results in the literature and share some ideas for future research. |
---|---|
ISSN: | 0921-0296 1573-0409 |
DOI: | 10.1007/s10846-021-01355-9 |