Loading…

A Cognitive Jamming Decision-Making Method Based on Heuristic Improved A2C Algorithm

Cognitive electronic warfare (CEW) has received increasing attention, and it is widely recognized that it will play a significant role. Cognitive jamming decision-making, as one of the critical technologies of CEW, dramatically impacts the global battlefield situation. In this paper, we introduce th...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on vehicular technology 2024-10, p.1-14
Main Authors: Zhang, Chudi, Yang, Biao, Wang, Lei, Ji, Wenshuai, Wang, Lulu, Xu, Shiyou
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Cognitive electronic warfare (CEW) has received increasing attention, and it is widely recognized that it will play a significant role. Cognitive jamming decision-making, as one of the critical technologies of CEW, dramatically impacts the global battlefield situation. In this paper, we introduce the A2C algorithm into cognitive jamming decision-making and propose a heuristic improved A2C algorithm. As a fusion algorithm of DQN and Policy Gradient, on the one hand, the Actor-Critic algorithm has the advantages of iterative updating of policies and high efficiency in complex spaces, compared to the DQN algorithm. On the other hand, compared with the Policy Gradient algorithm, it features fast convergence. But, it suffers from high variance, making convergence difficult. First, we establish a cognitive jamming decision-making model to address the above issues. Then, we develop an improved A2C algorithm by introducing a baseline and dueling networks. The baseline reduces the variance of the critic network, while dueling networks further decrease variance, enhancing the convergence of the A2C algorithm. Additionally, the improved A2C algorithm does not rely on prior information, and enhances the adaptive capability of the jammer when interacting with the target radar. We conducted numerical simulations based on the designed cognitive jamming decisionmaking model. The results demonstrated that compared with the four algorithms (DQN, Policy Gradient, Actor-Critic, A2C), the convergence speed of the improved A2C algorithm is improved by 50%, 57.8%, 34.5%, and 13.64%, respectively, and verified the excellent performance of the improved A2C. Finally, we introduce the heuristic reward function and propose the heuristic improved A2C algorithm. Compared with the improved A2C algorithm, the convergence speed of this algorithm is improved by 31.58%. The simulation results prove that the algorithm can greatly improve our advantage in electronic countermeasures
ISSN:0018-9545
1939-9359
DOI:10.1109/TVT.2024.3470832