Loading…

Deep reinforcement learning-based dynamic multi-beam power allocation for GEO-LEO co-existing satellites

This paper first formulates a novel long-term beam power allocation (BPA) problem to tackle the harmful co-linear interference issue in the geostationary earth orbit (GEO) and low earth orbit (LEO) co-existing satellite system. This BPA problem intends to optimize the long-term weighted sum rate of...

Full description

Saved in:

Bibliographic Details
Published in:	Acta astronautica 2024-10, Vol.223, p.197-209
Main Authors:	Xu, Jing, Fan, Simeng, Zhao, Zhongtian, Li, Fan, Zhang, Yizhai
Format:	Article
Language:	English
Subjects:	Deep reinforcement learning Dynamic beam power allocation Fractional optimization GEO-LEO co-existing satellite system Proximal policy optimization
Citations:	Items that this one cites
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper first formulates a novel long-term beam power allocation (BPA) problem to tackle the harmful co-linear interference issue in the geostationary earth orbit (GEO) and low earth orbit (LEO) co-existing satellite system. This BPA problem intends to optimize the long-term weighted sum rate of the LEO system while ensuring that GEO user’s received interference from the LEO satellite system is lower than a pre-fixed threshold. To solve it in a real-time manner, a deep reinforcement learning (DRL) framework based on the proximal policy optimization (PPO) algorithm is proposed, named as drlBPA. In addition, for the existing most relevant baseline, the fractional optimization (FO)-based BPA scheme, on the one hand, this paper improves it via a greedy strategy to fully exploit time resource. On the other hand, to further reduce the computational complexity stemming from its iterative solving procedure, a deep neural network approximation scheme is also developed. Simulation results demonstrate that (i) The trained DRL model of the proposed drlBPA scheme has good convergence and generality performance. (ii) Compared with the three FO-based benchmarks, the drlBPA scheme not only achieves the highest throughput of the LEO system within a significantly reduced computation time, but also yields the best system stability. •Maximizes the LEO system throughput while ensuring the GEO system service quality.•Builds a PPO algorithm based deep reinforcement learning framework.•Constructs two benchmark schemes: improved FO (IFO) and the DNN-accelerated IFO.•Achieves the highest LEO system throughput with the lowest computational complexity.•Yields the best system stability compared with the benchmarks.
ISSN:	0094-5765
DOI:	10.1016/j.actaastro.2024.07.004