Loading…
Multiagent Online Source Seeking Using Bandit Algorithm
This article presents a learning-based algorithm for solving the online source-seeking problem with a multiagent system under an unknown dynamical environment. Our algorithm, building on a notion termed as dummy confidence upper bound (D-UCB), integrates both estimation of the unknown environment an...
Saved in:
Published in: | IEEE transactions on automatic control 2023-05, Vol.68 (5), p.3147-3154 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This article presents a learning-based algorithm for solving the online source-seeking problem with a multiagent system under an unknown dynamical environment. Our algorithm, building on a notion termed as dummy confidence upper bound (D-UCB), integrates both estimation of the unknown environment and task planning for the multiple agents simultaneously, and as a result, enables the multiple agents to track the extremum spots of the dynamical environment in an online manner. Unlike the standard confidence upper bound algorithm in the context of multiarmed bandits, the notion of D-UCB helps significantly reduce the computational complexity in solving the subproblems of task planning, and thus renders our algorithm exceptionally computation-efficient in the distributed setting. The performance of our algorithm is theoretically guaranteed by showing a sublinear upper bound of the cumulative regret. Numerical results on a real-world pollution monitoring and tracking problem are also provided to demonstrate the effectiveness of the algorithm. |
---|---|
ISSN: | 0018-9286 1558-2523 |
DOI: | 10.1109/TAC.2022.3232190 |