Loading…

Exploiting Partial Observability for Optimal Deception

Deception is a useful tool in situations where an agent operates in the presence of its adversaries. We consider a setting where a supervisor provides a reference policy to an agent, expects the agent to operate in an environment by following the reference policy, and partially observes the agent�...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on automatic control 2023-07, Vol.68 (7), p.4443-4450
Main Authors:	Karabag, Mustafa O., Ornik, Melkior, Topcu, Ufuk
Format:	Article
Language:	English
Subjects:	Algorithms Computational complexity Computational geometry Convex functions Convexity Deception deception under partial observations Hidden Markov models Hypothesis testing Markov decision processes (MDPs) Markov processes Mixtures Observability Optimization Path planning Policies Polynomials Synthesis Task analysis Virtual private networks
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Deception is a useful tool in situations where an agent operates in the presence of its adversaries. We consider a setting where a supervisor provides a reference policy to an agent, expects the agent to operate in an environment by following the reference policy, and partially observes the agent's behavior. The agent instead follows a different deceptive policy to achieve a different task. We model the environment with a Markov decision process and study the synthesis of optimal deceptive policies under partial observability. We formalize the notion of deception as a hypothesis testing problem and show that the synthesis of optimal deceptive policies is nondeterministic polynomial-time hard (NP-hard). As an approximation, we consider the class of mixture policies, which provides a convex optimization formulation of the deception problem. We give an algorithm that converges to the optimal mixture policy. We also consider a special class of Markov decision processes where the transition and observation functions are deterministic. For this case, we give a randomized algorithm for path planning that generates a path for the agent in polynomial time and achieves the optimal value for the considered objective function.
ISSN:	0018-9286 1558-2523
DOI:	10.1109/TAC.2022.3209959