航空学报 > 2024, Vol. 45 Issue (22): 330195-330195   doi: 10.7527/S1000-6893.2024.30195

面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策

殷凯杰1, 石嘉1(), 段国栋2, 李立欣3, 司江勃1   

  1. 1.西安电子科技大学 通信工程学院,西安 710071
    2.中国电子科技集团公司第二十九研究所,成都 610036
    3.西北工业大学 电子信息学院,西安 710129
  • 收稿日期:2024-01-19 修回日期:2024-02-05 接受日期:2024-02-29 出版日期:2024-11-25 发布日期:2024-03-11
  • 通讯作者: 石嘉 E-mail:jiashi@xidian.edu.cn
  • 基金资助:
    电磁空间作战与应用重点实验室基金(JJ2021-001)

Greedy-PPO intelligent spectrum sharing decision for complex electromagnetic interference environments

Kaijie YIN1, Jia SHI1(), Guodong DUAN2, Lixin LI3, Jiangbo SI1   

  1. 1.School of Telecommunications Engineering,Xidian University,Xi’an 710071,China
    2.Southwest China Research Institute of Electronic Equipment,Chengdu 610036,China
    3.School of Electronics and lnformation,Northwestern Polytechnical University,Xi’an 710129,China
  • Received:2024-01-19 Revised:2024-02-05 Accepted:2024-02-29 Online:2024-11-25 Published:2024-03-11
  • Contact: Jia SHI E-mail:jiashi@xidian.edu.cn
  • Supported by:
    Key Laboratory Fund for Electromagnetic Space Operations and Applications(JJ2021-001)

摘要:

针对复杂电磁环境下的多功能电磁设备用频激烈冲突问题,考虑连续和离散混合动作耦合决策挑战,研究基于强化学习的智能频谱共享技术。首先,考虑己方和干扰方用频规则等多方面因素影响,对复杂电磁干扰环境进行精细化建模,在此基础上,设计多任务需求下雷达通信一体化设备的频谱共享效能评估方法。其次,提出一种Greedy Proximal Policy Optimization(Greedy-PPO)智能频谱共享决策算法,对离散-连续动作空间进行解耦,利用PPO方法最优配置传输功率,基于此,结合Greedy方法求解频谱离散优化分配问题,获得近似最优的联合频谱共享策略。最后,通过仿真实验验证,Greedy-PPO算法相比贪心算法和DDQN算法,总体效能指标可提升48%和15%,具有优良的频谱利用率表现。

关键词: 频谱共享, 强化学习, 规则算法, 决策管理, 混合动作空间

Abstract:

Considering the challenge of continuous and discrete hybrid action coupling decision-making, an intelligent spectrum sharing technology based on reinforcement learning is studied to solve the problem of intense frequency conflict of multi-functional electromagnetic equipment in complex electromagnetic environment. Firstly, considering the influence of many factors such as the frequency rules of the own side and the jamming side, a sophisticated model of the complex electromagnetic interference environment is developed. Based on this, a spectrum sharing efficiency evaluation index for radar communication integrated equipment under multitask requirements is designed. Secondly, a Greedy Proximal Policy Optimization (Greedy-PPO) intelligent spectrum sharing decision algorithm is proposed, which decouples the discrete continuous action space and uses the PPO method to optimize the allocation of transmission power. Then, the Greedy method is employed to solve the problem of spectrum discrete optimization allocation and obtain an approximately optimal joint spectrum sharing strategy. Finally, through simulation experiments, it is verified that the Greedy PPO algorithm can improve the overall performance by 48% and 15% compared to greedy algorithms and DDQN algorithms, respectively, demonstrating excellent performance of spectrum utilization.

Key words: spectrum sharing, reinforcement learning, rule algorithm, decision management, hybrid action space

中图分类号: