Loading…

Assessing the effect of visual servoing on the performance of linear microphone arrays in moving human-robot interaction scenarios

•The effect of visual servoing in the performance of a linear microphone array regarding distant ASR is assessed in a mobile, dynamic and non-stationary robotic testbed that can be representative of real HRI scenarios.•A state-of-the-art mobile robotic testbed had to be set up with target speech and...

Full description

Saved in:
Bibliographic Details
Published in:Computer speech & language 2021-01, Vol.65, p.101136, Article 101136
Main Authors: Díaz, Alejandro, Mahu, Rodrigo, Novoa, Jose, Wuth, Jorge, Datta, Jayanta, Yoma, Nestor Becerra
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•The effect of visual servoing in the performance of a linear microphone array regarding distant ASR is assessed in a mobile, dynamic and non-stationary robotic testbed that can be representative of real HRI scenarios.•A state-of-the-art mobile robotic testbed had to be set up with target speech and noise sources.•This paper focuses on an effect that is rarely addressed in the literature: the dependence of the beamforming directivity gain on look direction.•The average reduction in WER achieved when the robot head was steered toward the target speech source was as high as 28.2%. Social robotics is becoming a reality and voice-based human-robot interaction is essential for a successful human-robot collaborative symbiosis. The main objective of this paper is to assess the effect of visual servoing in the performance of a linear microphone array regarding distant ASR in a mobile, dynamic and non-stationary robotic testbed that can be representative of real HRI scenarios. Visual servoing and image target tracking are different tasks, and this paper focuses on an effect that is rarely addressed in the literature: the dependence of the beamforming directivity on look direction. The datasets required to carry out the study reported here do not exist and had to be generated. A state-of-the-art mobile robotic testbed had to be set up with target speech and noise sources. A linear microphone array was chosen as a case of study and its response was measured. Standard beamforming methods were evaluated with respect to visual servoing: delay-and-sum combined with image tracking; weighted delay-and-sum; and, MVDR also combined with image tracking. The results presented here show that the performance of beamforming methods is dramatically degraded in moving and non-stationary conditions. In this context, visual servoing in HRI can significantly improve the performance of a linear microphone array regarding ASR accuracy. The average reduction in WER achieved when the robot head was steered toward the target speech source was as high as 28.2%. Finally, it is worth highlighting that the methodology adopted here is applicable to any microphone array, linear or not.
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2020.101136