Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots
Co-speech gestures enhance interaction experiences between humans as well as between humans and robots. Most existing robots use rule-based speech-gesture association, but this requires human labor and prior knowledge of experts to be implemented. We present a learning-based co-speech gesture genera...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Conference Proceeding |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Co-speech gestures enhance interaction experiences between humans as well as between humans and robots. Most existing robots use rule-based speech-gesture association, but this requires human labor and prior knowledge of experts to be implemented. We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks. The proposed end-to-end neural network model consists of an encoder for speech text understanding and a decoder to generate a sequence of gestures. The model successfully produces various gestures including iconic, metaphoric, deictic, and beat gestures. In a subjective evaluation, participants reported that the gestures were human-like and matched the speech content. We also demonstrate a co-speech gesture with a NAO robot working in real time. |
---|---|
ISSN: | 2577-087X |
DOI: | 10.1109/ICRA.2019.8793720 |