Loading…

Flight, Camera, Action! Using Natural Language and Mixed Reality to Control a Drone

With increasing autonomy, robots like drones are increasingly accessible to untrained users. Most users control drones using a low-level interface, such as a radio-controlled (RC) controller. For a wider adoption of these technologies by the public, a much higher-level interface, such as natural lan...

Full description

Saved in:
Bibliographic Details
Main Authors: Huang, Baichuan, Bayazit, Deniz, Ullman, Daniel, Gopalan, Nakul, Tellex, Stefanie
Format: Conference Proceeding
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:With increasing autonomy, robots like drones are increasingly accessible to untrained users. Most users control drones using a low-level interface, such as a radio-controlled (RC) controller. For a wider adoption of these technologies by the public, a much higher-level interface, such as natural language or mixed reality (MR), allows the automation of the control of the agent in a goal-oriented setting. We present an interface that uses natural language grounding within an MR environment to solve high-level task and navigational instructions given to an autonomous drone. To the best of our knowledge, this is the first work to perform fully autonomous language grounding in an MR setting for a robot. Given a map, our interface first grounds natural language commands to reward specifications within a Markov Decision Process (MDP) framework. Then, it passes the reward specification to an MDP solver. Finally, the drone performs the desired operations in the real world while planning and localizing itself. Our approach uses MR to provide a set of known virtual landmarks, enabling the drone to understand commands referring to objects without being equipped with object detectors for multiple novel objects or a predefined environment model. We conducted an exploratory user study to assess users' experience of our MR interface with and without natural language, as compared to a web interface. We found that users were able to command the drone more quickly via both MR interfaces as compared to the web interface, with roughly equal system usability scores across all three interfaces.
ISSN:2577-087X
DOI:10.1109/ICRA.2019.8794200