Loading…

Multiagent Online Source Seeking Using Bandit Algorithm

This article presents a learning-based algorithm for solving the online source-seeking problem with a multiagent system under an unknown dynamical environment. Our algorithm, building on a notion termed as dummy confidence upper bound (D-UCB), integrates both estimation of the unknown environment an...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on automatic control 2023-05, Vol.68 (5), p.3147-3154
Main Authors: Du, Bin, Qian, Kun, Claudel, Christian, Sun, Dengfeng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This article presents a learning-based algorithm for solving the online source-seeking problem with a multiagent system under an unknown dynamical environment. Our algorithm, building on a notion termed as dummy confidence upper bound (D-UCB), integrates both estimation of the unknown environment and task planning for the multiple agents simultaneously, and as a result, enables the multiple agents to track the extremum spots of the dynamical environment in an online manner. Unlike the standard confidence upper bound algorithm in the context of multiarmed bandits, the notion of D-UCB helps significantly reduce the computational complexity in solving the subproblems of task planning, and thus renders our algorithm exceptionally computation-efficient in the distributed setting. The performance of our algorithm is theoretically guaranteed by showing a sublinear upper bound of the cumulative regret. Numerical results on a real-world pollution monitoring and tracking problem are also provided to demonstrate the effectiveness of the algorithm.
ISSN:0018-9286
1558-2523
DOI:10.1109/TAC.2022.3232190