面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策

doi:10.7527/S1000-6893.2024.30195

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策

殷凯杰¹, 石嘉¹(), 段国栋², 李立欣³, 司江勃¹

^1.西安电子科技大学通信工程学院，西安 710071
^2.中国电子科技集团公司第二十九研究所，成都 610036
^3.西北工业大学电子信息学院，西安 710129

收稿日期:2024-01-19 修回日期:2024-02-05 接受日期:2024-02-29 出版日期:2024-11-25 发布日期:2024-03-11
通讯作者: 石嘉 E-mail:jiashi@xidian.edu.cn
基金资助:
电磁空间作战与应用重点实验室基金(JJ2021-001)

Greedy-PPO intelligent spectrum sharing decision for complex electromagnetic interference environments

Kaijie YIN¹, Jia SHI¹(), Guodong DUAN², Lixin LI³, Jiangbo SI¹

^1.School of Telecommunications Engineering，Xidian University，Xi’an 710071，China
^2.Southwest China Research Institute of Electronic Equipment，Chengdu 610036，China
^3.School of Electronics and lnformation，Northwestern Polytechnical University，Xi’an 710129，China

Received:2024-01-19 Revised:2024-02-05 Accepted:2024-02-29 Online:2024-11-25 Published:2024-03-11
Contact: Jia SHI E-mail:jiashi@xidian.edu.cn
Supported by:
Key Laboratory Fund for Electromagnetic Space Operations and Applications(JJ2021-001)

摘要/Abstract

摘要：

针对复杂电磁环境下的多功能电磁设备用频激烈冲突问题，考虑连续和离散混合动作耦合决策挑战，研究基于强化学习的智能频谱共享技术。首先，考虑己方和干扰方用频规则等多方面因素影响，对复杂电磁干扰环境进行精细化建模，在此基础上，设计多任务需求下雷达通信一体化设备的频谱共享效能评估方法。其次，提出一种Greedy Proximal Policy Optimization（Greedy-PPO）智能频谱共享决策算法，对离散-连续动作空间进行解耦，利用PPO方法最优配置传输功率，基于此，结合Greedy方法求解频谱离散优化分配问题，获得近似最优的联合频谱共享策略。最后，通过仿真实验验证，Greedy-PPO算法相比贪心算法和DDQN算法，总体效能指标可提升48%和15%，具有优良的频谱利用率表现。

关键词: 频谱共享, 强化学习, 规则算法, 决策管理, 混合动作空间

Abstract:

Considering the challenge of continuous and discrete hybrid action coupling decision-making， an intelligent spectrum sharing technology based on reinforcement learning is studied to solve the problem of intense frequency conflict of multi-functional electromagnetic equipment in complex electromagnetic environment. Firstly， considering the influence of many factors such as the frequency rules of the own side and the jamming side， a sophisticated model of the complex electromagnetic interference environment is developed. Based on this， a spectrum sharing efficiency evaluation index for radar communication integrated equipment under multitask requirements is designed. Secondly， a Greedy Proximal Policy Optimization （Greedy-PPO） intelligent spectrum sharing decision algorithm is proposed， which decouples the discrete continuous action space and uses the PPO method to optimize the allocation of transmission power. Then， the Greedy method is employed to solve the problem of spectrum discrete optimization allocation and obtain an approximately optimal joint spectrum sharing strategy. Finally， through simulation experiments， it is verified that the Greedy PPO algorithm can improve the overall performance by 48% and 15% compared to greedy algorithms and DDQN algorithms， respectively， demonstrating excellent performance of spectrum utilization.

Key words: spectrum sharing, reinforcement learning, rule algorithm, decision management, hybrid action space

中图分类号:

V243

殷凯杰, 石嘉, 段国栋, 李立欣, 司江勃. 面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策[J]. 航空学报, 2024, 45(22): 330195.

Kaijie YIN, Jia SHI, Guodong DUAN, Lixin LI, Jiangbo SI. Greedy-PPO intelligent spectrum sharing decision for complex electromagnetic interference environments[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(22): 330195.

图/表 14

表1

符号及变量说明

符号	说明
$P x k, t, y k, t r e c, f$	坐标 $(x k, t, y k, t)$ 处在 $f$ 频段检测的功率强度
$p j a m f i x e d$	干扰设备的额定干扰功率
$S I N R t, n, f$	$t$ 时隙设备 $n$ 在 $f$ 频段的的信噪比
$P t, n, c c o m$	$t$ 时隙在指挥中心 $c$ 处收到雷达通信一体化设备 $n$ 的通信功率
$P t, n, c r a d$	$t$ 时隙在指挥中心 $c$ 处收到雷达通信一体化设备 $n$ 的雷达功率
$I$	各个干扰设备的叠加干扰
$N t$	电磁环境变化噪声
$α$	雷达通信一体化设备的布尔型通信开关
$β$	雷达通信一体化设备的布尔型雷达开关
$E t, n c o m$	$t$ 时隙设备 $n$ 的通信效能评估指标
$S I N R t h r e s h o l d$	信噪比门限
$R m a x$	雷达最大探测距离
$R t h r e s h o l d$	雷达探测距离门限
$E t, n r a d$	$t$ 时隙设备 $n$ 的雷达效能评估指标
$γ$	折扣因子
$λ$	优势函数参数
$P m i n c o m$	通信最小发射功率
$P m a x c o m$	通信最大发射功率
$P m i n r a d$	雷达最小发射功率
$P m a x r a d$	雷达最大发射功率

表1

图 1

图 2

图 3

图4

表2

仿真初始化参数配置

参数	数值
可用频段中心频率/GHz	$[0.6,0.9,1.2,1.5,1.8,2.3]$
雷达最小发射功率 $P m i n r a d$ /kW	23
雷达最大发射功率 $P m a x r a d$ /kW	25
通信最小发射功率/W	5
通信最大发射功率/W	80
雷达一体化设备个数	2/5/10/15/20
栅格化范围	20×20
时隙个数	20
环境噪声强度随机变化档位/W	$[10 - 11, 5 × 10 - 11, 10 - 10, 2 × 10 - 10]$
环境噪声轮换频率/时隙	1
干扰设备干扰信道轮换频率/时隙	1
经验池大小	1 024
折扣因子 $γ$	0.9
优势函数参数 $λ$	0.9
加权系数 $η$	0.5
裁剪超参数 $ε$	0.2
actor网络学习率	0.001
critic网络学习率	0.001
$R t h r e s h o l d$ /km	50
$S I N R t h r e s h o l d$	1
雷达一体化设备初始状态	关闭
干扰设备个数	4
红方设备额定干扰功率 $p j a m f i x e d$ /W	200

表2

图5

图6

图7

图8

图9

图10

图11

图12

参考文献 29

1	金宁. 美军电磁频谱战理念发展及能力建设现状探析［J］. 军事文摘， 2022，（17）： 7-10.
	JIN N. Analysis of the development and capacity building of the US electromagnetic spectrum warfare concept［J］. Military Digest， 2022，（17）： 7-10 （in Chinese）.
2	丁国如，孙佳琛，王海超，等. 复杂电磁环境下频谱智能管控技术探讨［J］. 航空学报， 2021， 42（4）： 524750.
	DING G R， SUN J C， WANG H C， et al. Discussion on technologies for intelligent spectrum management and control under complex electromagnetic environments［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（4）： 524750 （in Chinese）.
3	龙晓波，张圣鹋，余晨，等. 复杂适应性系统-电磁频谱战的解决之道［J］. 中国电子科学研究院学报， 2022， 17（11）： 1037-1041， 1056.
	LONG X B， ZHANG S M， YU C， et al. Complex adaptive system-the solution of electromagnetic spectrum warfare［J］. Journal of China Academy of Electronics and Information Technology， 2022， 17（11）： 1037-1041， 1056 （in Chinese）.
4	刘东，吴启晖， Tony Q. S. Quek. 面向航空6G的频谱认知智能管控［J］. 物联网学报， 2020， 4（1）： 12-18.
	LIU D， WU Q H， TONY Q S Q. Spectrum cognitive intelligent management and control for aviation 6G［J］. Chinese Journal on Internet of Things， 2020， 4（1）： 12-18 （in Chinese）.
5	彭沛，李震. 战场频率管理方法梳理探究［J］. 数字通信世界， 2017（9）： 42-43.
	PENG P， LI Z. Exploration and sorting of battlefield frequency management methods［J］. Digital Communication World， 2017（9）： 42-43 （in Chinese）.
6	刘鹏，张国翊，舒放，等. 基于图论的认知无线网络频谱动态分配［J］. 电讯技术， 2020， 60（6）： 625-631.
	LIU P， ZHANG G Y， SHU F， et al. Dynamic spectrum allocation in cognitive radio networks based on graph theory［J］. Telecommunication Engineering， 2020， 60（6）： 625-631 （in Chinese）.
7	周健. 高密度网络中基于图论的快速频谱分配方案研究［D］. 合肥：合肥工业大学， 2018.
	ZHOU J. Research on fast spectrum allocation scheme based on graph theory in high density network［D］. Hefei： Hefei University of Technology， 2018 （in Chinese）.
8	程启明. 基于改进敏感图着色算法的认知无线电频谱分配研究［D］. 成都：西南交通大学， 2016.
	CHENG Q M. Research on spectrum allocation of cognitive radio based on improved sensitive graph coloring algorithm［D］.Chengdu： Southwest Jiaotong University， 2016 （in Chinese）.
9	韩志豪，赵东来，王钢. 超密集网络中基于博弈论的频谱分配策略研究［J］. 无线电工程， 2021， 51（1）： 19-24.
	HAN Z H， ZHAO D L， WANG G. Research on spectrum allocation strategy based on game theory in ultra dense network［J］. Radio Engineering， 2021， 51（1）： 19-24 （in Chinese）.
10	ZHANG L， XIE J L， CHEN Y M. Cognitive spectrum sharing algorithm based on secondary users grouping［C］∥2020 International Conference on Robots & Intelligent System （ICRIS）. Piscataway： IEEE Press， 2020： 564-568.
11	SUREKHA S， RAHMAN M Z U. Spectrum sensing and allocation strategy for IoT devices using continuous-time Markov chain-based game theory model［J］. IEEE Sensors Letters， 2022， 6（4）： 5500504.
12	TRAN Q N， VO N S， BUI M P， et al. Spectrum sharing and power allocation optimised multihop multipath D2D video delivery in beyond 5G networks［J］. IEEE Transactions on Cognitive Communications and Networking， 2022， 8（2）： 919-930.
13	孙汉卿，刘征，王桂芝，等. 基于多态蚁群优化算法的认知无线电频谱分配［J］. 计算机应用与软件， 2020， 37（12）： 260-265， 321.
	SUN H Q， LIU Z， WANG G Z， et al. Cognitive radio spectrum allocation based on improved polymorphic ant colony algorithm［J］. Computer Applications and Software， 2020， 37（12）： 260-265， 321 （in Chinese）.
14	赵显煜，王俊，邢新华. 基于改进蚁群算法的认知无线电频谱分配的策略研究［J］. 通信技术， 2020， 53（10）： 2454-2460.
	ZHAO X Y， WANG J， XING X H. Cognitive radio spectrum allocation strategy based on modified ant colony algorithm［J］. Communications Technology， 2020， 53（10）： 2454-2460 （in Chinese）.
15	苏慧慧，彭艺，曲文博. 基于疯狂自适应鱼群算法的认知无线电频谱分配［J］. 应用科学学报， 2020， 38（6）： 882-889.
	SU H H， PENG Y， QU W B. Cognitive radio spectrum allocation based on crazy adaptive fish swarm algorithm［J］. Journal of Applied Sciences， 2020， 38（6）： 882-889 （in Chinese）.
16	ZLOBINSKY N， JOHNSON D L， MISHRA A K， et al. Comparison of metaheuristic algorithms for interface-constrained channel assignment in a hybrid dynamic spectrum access-Wi-Fi infrastructure WMN［J］. IEEE Access， 2022， 10： 26654-26680.
17	WANG W B， KWASINSKI A， NIYATO D， et al. A survey on applications of model-free strategy learning in cognitive wireless networks［J］. IEEE Communications Surveys & Tutorials， 2016， 18（3）： 1717-1757.
18	WANG Y H， YE Z F， WAN P， et al. A survey of dynamic spectrum allocation based on reinforcement learning algorithms in cognitive radio networks［J］. Artificial Intelligence Review， 2019， 51（3）： 493-506.
19	王倩，聂秀山，耿蕾蕾，等. D2D通信中基于Q学习的联合资源分配与功率控制算法［J］. 南京大学学报（自然科学）， 2018， 54（6）： 1183-1192.
	WANG Q， NIE X S， GENG L L， et al. Joint resource allocation and power control strategy based on Q-Learning method in cellular D2D network［J］. Journal of Nanjing University （Natural Science）， 2018， 54（6）： 1183-1192 （in Chinese）.
20	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518： 529-533.
21	FAN Y X， HUANG J X， WANG X Y， et al. Resource allocation for V2X assisted automotive radar system based on reinforcement learning［C］∥2022 14th International Conference on Wireless Communications and Signal Processing （WCSP）. Piscataway： IEEE Press， 2022： 672-676.
22	ZHANG Z B， CHANG Q， YANG S Z， et al. Sensing-communication bandwidth allocation in vehicular links based on reinforcement learning［J］. IEEE Wireless Communications Letters， 2023， 12（1）： 11-15.
23	BIRHANU ENGIDAYEHU S， MAHBOOB T， YOUNG CHUNG M. Deep reinforcement learning-based task offloading and resource allocation in MEC-enabled wireless networks［C］∥ 2022 27th Asia Pacific Conference on Communications （APCC）. Piscataway： IEEE Press， 2022： 226-230.
24	REN J， XU S. DDPG based computation offloading and resource allocation for MEC systems with energy harvesting［C］∥2021 IEEE 93rd Vehicular Technology Conference （VTC2021-Spring）. Piscataway： IEEE Press， 2021： 1-5.
25	李佳琪. 雷达电磁环境智能认知方法研究［D］. 西安：西安电子科技大学， 2022.
	LI J Q. Research on radar electromagnetic ambient intelligence cognitive method［D］. Xi’an： Xidian University， 2022 （in Chinese）.
26	HUANGI R， SI J B， SHI J， et al. Deep-reinforcement-learning-based resource allocation in ultra-dense network［C］∥2021 13th International Conference on Wireless Communications and Signal Processing （WCSP）. Piscataway： IEEE Press， 2021： 1-5.
27	赵嘉荣. 雷达辅助的通信感知一体化关键技术研究［D］. 成都：电子科技大学， 2023.
	ZHAO J R. Research on key technologies of radar-assisted communication perception integration［D］. Chengdu： University of Electronic Science and Technology of China， 2023 （in Chinese）.
28	SCHULMAN J， LEVINE S， MORITZ P， et al. Trust region policy optimization［DB/OL］. arXiv preprint： 1502.05477， 2015.
29	XU T Y， ZOU S F， LIANG Y B. Two time-scale off-policy TD learning： Non-asymptotic analysis over Markovian samples［DB/OL］. arXiv preprint： 1909.11907， 2019.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

[1]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[2]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[3]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[4]	王辰, 魏才盛, 殷泽阳, 靳锴, 李星辰. 考虑信道资源约束的多无人机航迹与通信策略协同规划[J]. 航空学报, 2025, 46(18): 331837-331837.
[5]	罗祎喆, 张辉, 余新得, 金钊, 冯朔, 石育澄, 徐明亮. 面向舰载机多波次弹药保障任务的分层动态调度[J]. 航空学报, 2025, 46(18): 331945-331945.
[6]	黄湘松, 王梦宇, 潘大鹏. 基于对抗强化学习的无人机逃离路径规划方法[J]. 航空学报, 2025, 46(17): 331637-331637.
[7]	王昱, 谢志鹏, 田永健, 孟光磊. 虚拟结构引领强化学习分布式无人机编队控制[J]. 航空学报, 2025, 46(15): 331354-331354.
[8]	陈伟, 李璐璐, 陈董, 张少辉, 李亚飞, 王可, 靳远远, 徐明亮. 差异化保障需求驱动的舰载机多机协同决策方法[J]. 航空学报, 2025, 46(13): 531274-531274.
[9]	陈旭东, 陈琦琦, 罗祎喆, 王佳宝, 徐明亮. 异构舰载机舰面保障作业动态并行调度[J]. 航空学报, 2025, 46(13): 531329-531329.
[10]	王政, 王华, 崔可可, 李超超, 刘俊楠, 徐明亮. 局部引导强化学习的舰载机自主调运方法[J]. 航空学报, 2025, 46(13): 531333-531333.
[11]	凌文辉, 牟春晖, 聂聆聪, 杜宪, 孙希明. 基于改进DDPG的宽速域几何可调燃烧室压力分布控制[J]. 航空学报, 2025, 46(12): 131092-131092.
[12]	余子杰, 郑征, 李清东, 郭林, 任素萍, 郭健. 基于深度强化学习的太阳能无人机航迹规划[J]. 航空学报, 2025, 46(12): 331420-331420.
[13]	赵长啸, 孙亦轩. 面向适航要求的eVTOL航电系统安全调度模型[J]. 航空学报, 2025, 46(11): 531252-531252.
[14]	高树一, 林德福, 郑多, 徐骋. 考虑拦截器探测能力限制的飞行器智能机动突防制导策略[J]. 航空学报, 2025, 46(10): 331304-331304.
[15]	刘广, 王华, 林友芳, 贺硕, 李亚飞, 徐明亮. 舰载机保障作业自适应批量匹配决策方法[J]. 航空学报, 2025, 46(1): 330615-330615.

面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策

Greedy-PPO intelligent spectrum sharing decision for complex electromagnetic interference environments

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 29

相关文章 15

编辑推荐

Metrics

本文评价