面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策

doi:10.7527/S1000-6893.2024.30195

Abstract

Abstract:

Considering the challenge of continuous and discrete hybrid action coupling decision-making， an intelligent spectrum sharing technology based on reinforcement learning is studied to solve the problem of intense frequency conflict of multi-functional electromagnetic equipment in complex electromagnetic environment. Firstly， considering the influence of many factors such as the frequency rules of the own side and the jamming side， a sophisticated model of the complex electromagnetic interference environment is developed. Based on this， a spectrum sharing efficiency evaluation index for radar communication integrated equipment under multitask requirements is designed. Secondly， a Greedy Proximal Policy Optimization （Greedy-PPO） intelligent spectrum sharing decision algorithm is proposed， which decouples the discrete continuous action space and uses the PPO method to optimize the allocation of transmission power. Then， the Greedy method is employed to solve the problem of spectrum discrete optimization allocation and obtain an approximately optimal joint spectrum sharing strategy. Finally， through simulation experiments， it is verified that the Greedy PPO algorithm can improve the overall performance by 48% and 15% compared to greedy algorithms and DDQN algorithms， respectively， demonstrating excellent performance of spectrum utilization.

Key words: spectrum sharing, reinforcement learning, rule algorithm, decision management, hybrid action space

CLC Number:

V243

Kaijie YIN, Jia SHI, Guodong DUAN, Lixin LI, Jiangbo SI. Greedy-PPO intelligent spectrum sharing decision for complex electromagnetic interference environments[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(22): 330195.

Figures/Tables 14

Table 1

Symbols and variables description

符号	说明
$P x k, t, y k, t r e c, f$	坐标 $(x k, t, y k, t)$ 处在 $f$ 频段检测的功率强度
$p j a m f i x e d$	干扰设备的额定干扰功率
$S I N R t, n, f$	$t$ 时隙设备 $n$ 在 $f$ 频段的的信噪比
$P t, n, c c o m$	$t$ 时隙在指挥中心 $c$ 处收到雷达通信一体化设备 $n$ 的通信功率
$P t, n, c r a d$	$t$ 时隙在指挥中心 $c$ 处收到雷达通信一体化设备 $n$ 的雷达功率
$I$	各个干扰设备的叠加干扰
$N t$	电磁环境变化噪声
$α$	雷达通信一体化设备的布尔型通信开关
$β$	雷达通信一体化设备的布尔型雷达开关
$E t, n c o m$	$t$ 时隙设备 $n$ 的通信效能评估指标
$S I N R t h r e s h o l d$	信噪比门限
$R m a x$	雷达最大探测距离
$R t h r e s h o l d$	雷达探测距离门限
$E t, n r a d$	$t$ 时隙设备 $n$ 的雷达效能评估指标
$γ$	折扣因子
$λ$	优势函数参数
$P m i n c o m$	通信最小发射功率
$P m a x c o m$	通信最大发射功率
$P m i n r a d$	雷达最小发射功率
$P m a x r a d$	雷达最大发射功率

Table 1

Fig.1

Fig.2

Fig.3

Fig.4

Table 2

Configuration of simulation initialization parameters

参数	数值
可用频段中心频率/GHz	$[0.6,0.9,1.2,1.5,1.8,2.3]$
雷达最小发射功率 $P m i n r a d$ /kW	23
雷达最大发射功率 $P m a x r a d$ /kW	25
通信最小发射功率/W	5
通信最大发射功率/W	80
雷达一体化设备个数	2/5/10/15/20
栅格化范围	20×20
时隙个数	20
环境噪声强度随机变化档位/W	$[10 - 11, 5 × 10 - 11, 10 - 10, 2 × 10 - 10]$
环境噪声轮换频率/时隙	1
干扰设备干扰信道轮换频率/时隙	1
经验池大小	1 024
折扣因子 $γ$	0.9
优势函数参数 $λ$	0.9
加权系数 $η$	0.5
裁剪超参数 $ε$	0.2
actor网络学习率	0.001
critic网络学习率	0.001
$R t h r e s h o l d$ /km	50
$S I N R t h r e s h o l d$	1
雷达一体化设备初始状态	关闭
干扰设备个数	4
红方设备额定干扰功率 $p j a m f i x e d$ /W	200

Table 2

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

Fig.10

Fig.11

Fig.12

References 29

1	金宁. 美军电磁频谱战理念发展及能力建设现状探析［J］. 军事文摘， 2022，（17）： 7-10.
	JIN N. Analysis of the development and capacity building of the US electromagnetic spectrum warfare concept［J］. Military Digest， 2022，（17）： 7-10 （in Chinese）.
2	丁国如，孙佳琛，王海超，等. 复杂电磁环境下频谱智能管控技术探讨［J］. 航空学报， 2021， 42（4）： 524750.
	DING G R， SUN J C， WANG H C， et al. Discussion on technologies for intelligent spectrum management and control under complex electromagnetic environments［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（4）： 524750 （in Chinese）.
3	龙晓波，张圣鹋，余晨，等. 复杂适应性系统-电磁频谱战的解决之道［J］. 中国电子科学研究院学报， 2022， 17（11）： 1037-1041， 1056.
	LONG X B， ZHANG S M， YU C， et al. Complex adaptive system-the solution of electromagnetic spectrum warfare［J］. Journal of China Academy of Electronics and Information Technology， 2022， 17（11）： 1037-1041， 1056 （in Chinese）.
4	刘东，吴启晖， Tony Q. S. Quek. 面向航空6G的频谱认知智能管控［J］. 物联网学报， 2020， 4（1）： 12-18.
	LIU D， WU Q H， TONY Q S Q. Spectrum cognitive intelligent management and control for aviation 6G［J］. Chinese Journal on Internet of Things， 2020， 4（1）： 12-18 （in Chinese）.
5	彭沛，李震. 战场频率管理方法梳理探究［J］. 数字通信世界， 2017（9）： 42-43.
	PENG P， LI Z. Exploration and sorting of battlefield frequency management methods［J］. Digital Communication World， 2017（9）： 42-43 （in Chinese）.
6	刘鹏，张国翊，舒放，等. 基于图论的认知无线网络频谱动态分配［J］. 电讯技术， 2020， 60（6）： 625-631.
	LIU P， ZHANG G Y， SHU F， et al. Dynamic spectrum allocation in cognitive radio networks based on graph theory［J］. Telecommunication Engineering， 2020， 60（6）： 625-631 （in Chinese）.
7	周健. 高密度网络中基于图论的快速频谱分配方案研究［D］. 合肥：合肥工业大学， 2018.
	ZHOU J. Research on fast spectrum allocation scheme based on graph theory in high density network［D］. Hefei： Hefei University of Technology， 2018 （in Chinese）.
8	程启明. 基于改进敏感图着色算法的认知无线电频谱分配研究［D］. 成都：西南交通大学， 2016.
	CHENG Q M. Research on spectrum allocation of cognitive radio based on improved sensitive graph coloring algorithm［D］.Chengdu： Southwest Jiaotong University， 2016 （in Chinese）.
9	韩志豪，赵东来，王钢. 超密集网络中基于博弈论的频谱分配策略研究［J］. 无线电工程， 2021， 51（1）： 19-24.
	HAN Z H， ZHAO D L， WANG G. Research on spectrum allocation strategy based on game theory in ultra dense network［J］. Radio Engineering， 2021， 51（1）： 19-24 （in Chinese）.
10	ZHANG L， XIE J L， CHEN Y M. Cognitive spectrum sharing algorithm based on secondary users grouping［C］∥2020 International Conference on Robots & Intelligent System （ICRIS）. Piscataway： IEEE Press， 2020： 564-568.
11	SUREKHA S， RAHMAN M Z U. Spectrum sensing and allocation strategy for IoT devices using continuous-time Markov chain-based game theory model［J］. IEEE Sensors Letters， 2022， 6（4）： 5500504.
12	TRAN Q N， VO N S， BUI M P， et al. Spectrum sharing and power allocation optimised multihop multipath D2D video delivery in beyond 5G networks［J］. IEEE Transactions on Cognitive Communications and Networking， 2022， 8（2）： 919-930.
13	孙汉卿，刘征，王桂芝，等. 基于多态蚁群优化算法的认知无线电频谱分配［J］. 计算机应用与软件， 2020， 37（12）： 260-265， 321.
	SUN H Q， LIU Z， WANG G Z， et al. Cognitive radio spectrum allocation based on improved polymorphic ant colony algorithm［J］. Computer Applications and Software， 2020， 37（12）： 260-265， 321 （in Chinese）.
14	赵显煜，王俊，邢新华. 基于改进蚁群算法的认知无线电频谱分配的策略研究［J］. 通信技术， 2020， 53（10）： 2454-2460.
	ZHAO X Y， WANG J， XING X H. Cognitive radio spectrum allocation strategy based on modified ant colony algorithm［J］. Communications Technology， 2020， 53（10）： 2454-2460 （in Chinese）.
15	苏慧慧，彭艺，曲文博. 基于疯狂自适应鱼群算法的认知无线电频谱分配［J］. 应用科学学报， 2020， 38（6）： 882-889.
	SU H H， PENG Y， QU W B. Cognitive radio spectrum allocation based on crazy adaptive fish swarm algorithm［J］. Journal of Applied Sciences， 2020， 38（6）： 882-889 （in Chinese）.
16	ZLOBINSKY N， JOHNSON D L， MISHRA A K， et al. Comparison of metaheuristic algorithms for interface-constrained channel assignment in a hybrid dynamic spectrum access-Wi-Fi infrastructure WMN［J］. IEEE Access， 2022， 10： 26654-26680.
17	WANG W B， KWASINSKI A， NIYATO D， et al. A survey on applications of model-free strategy learning in cognitive wireless networks［J］. IEEE Communications Surveys & Tutorials， 2016， 18（3）： 1717-1757.
18	WANG Y H， YE Z F， WAN P， et al. A survey of dynamic spectrum allocation based on reinforcement learning algorithms in cognitive radio networks［J］. Artificial Intelligence Review， 2019， 51（3）： 493-506.
19	王倩，聂秀山，耿蕾蕾，等. D2D通信中基于Q学习的联合资源分配与功率控制算法［J］. 南京大学学报（自然科学）， 2018， 54（6）： 1183-1192.
	WANG Q， NIE X S， GENG L L， et al. Joint resource allocation and power control strategy based on Q-Learning method in cellular D2D network［J］. Journal of Nanjing University （Natural Science）， 2018， 54（6）： 1183-1192 （in Chinese）.
20	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518： 529-533.
21	FAN Y X， HUANG J X， WANG X Y， et al. Resource allocation for V2X assisted automotive radar system based on reinforcement learning［C］∥2022 14th International Conference on Wireless Communications and Signal Processing （WCSP）. Piscataway： IEEE Press， 2022： 672-676.
22	ZHANG Z B， CHANG Q， YANG S Z， et al. Sensing-communication bandwidth allocation in vehicular links based on reinforcement learning［J］. IEEE Wireless Communications Letters， 2023， 12（1）： 11-15.
23	BIRHANU ENGIDAYEHU S， MAHBOOB T， YOUNG CHUNG M. Deep reinforcement learning-based task offloading and resource allocation in MEC-enabled wireless networks［C］∥ 2022 27th Asia Pacific Conference on Communications （APCC）. Piscataway： IEEE Press， 2022： 226-230.
24	REN J， XU S. DDPG based computation offloading and resource allocation for MEC systems with energy harvesting［C］∥2021 IEEE 93rd Vehicular Technology Conference （VTC2021-Spring）. Piscataway： IEEE Press， 2021： 1-5.
25	李佳琪. 雷达电磁环境智能认知方法研究［D］. 西安：西安电子科技大学， 2022.
	LI J Q. Research on radar electromagnetic ambient intelligence cognitive method［D］. Xi’an： Xidian University， 2022 （in Chinese）.
26	HUANGI R， SI J B， SHI J， et al. Deep-reinforcement-learning-based resource allocation in ultra-dense network［C］∥2021 13th International Conference on Wireless Communications and Signal Processing （WCSP）. Piscataway： IEEE Press， 2021： 1-5.
27	赵嘉荣. 雷达辅助的通信感知一体化关键技术研究［D］. 成都：电子科技大学， 2023.
	ZHAO J R. Research on key technologies of radar-assisted communication perception integration［D］. Chengdu： University of Electronic Science and Technology of China， 2023 （in Chinese）.
28	SCHULMAN J， LEVINE S， MORITZ P， et al. Trust region policy optimization［DB/OL］. arXiv preprint： 1502.05477， 2015.
29	XU T Y， ZOU S F， LIANG Y B. Two time-scale off-policy TD learning： Non-asymptotic analysis over Markovian samples［DB/OL］. arXiv preprint： 1909.11907， 2019.

[1]	Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024-331024.
[2]	Lingfeng JIANG, Xinkai LI, Hai ZHANG, Hanwei LI, Hongli ZHANG. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331035-331035.
[3]	Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553-630553.
[4]	Chen WANG, Caisheng WEI, Zeyang YIN, Kai JIN, Xingchen LI. Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331837-331837.
[5]	Yizhe LUO, Hui ZHANG, Xinde YU, Zhao JIN, Shuo FENG, Yucheng SHI, Mingling XU. Hierarchical dynamic scheduling for multi-wave carrier-based aircraft ammunition support missions [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331945-331945.
[6]	Xiangsong HUANG, Mengyu WANG, Dapeng PAN. Adversarial reinforcement learning-based UAV escape path planning method [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(17): 331637-331637.
[7]	Yu WANG, Zhipeng XIE, Yongjian TIAN, Guanglei MENG. Distributed UAV formation control with virtual structure guided reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331354-331354.
[8]	Wei CHEN, Lulu LI, Dong CHEN, Shaohui ZHANG, Yafei LI, Ke WANG, Yuanyuan JIN, Mingliang XU. Multi-aircraft cooperative decision-making methods driven by differentiated support demands for carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531274-531274.
[9]	Xudong CHEN, Qiqi CHEN, Yizhe LUO, Jiabao WANG, Mingliang XU. Dynamic parallel scheduling of heterogeneous carrier-based aircraft deck support operations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531329-531329.
[10]	Zheng WANG, Hua WANG, Keke CUI, Chaochao LI, Junnan LIU, Mingliang XU. Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531333-531333.
[11]	Wenhui LING, Chunhui MU, Lingcong NIE, Xian DU, Ximing SUN. Improved DDPG-based multipoint pressure distribution control of variable geometry scramjet combustor at wide range velocities [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 131092-131092.
[12]	Zijie YU, Zheng ZHENG, Qingdong LI, Lin GUO, Suping REN, Jian GUO. Trajectory planning for solar-powered UAVs based on deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 331420-331420.
[13]	Changxiao ZHAO, Yixuan SUN. A safe scheduling model for eVTOL avionics systems for airworthiness requirements [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(11): 531252-531252.
[14]	Shuyi GAO, Defu LIN, Duo ZHENG, Cheng XU. Intelligent maneuvering penetration guidance strategies for aerial vehicles considering interceptor detection capability limitations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(10): 331304-331304.
[15]	Guang LIU, Hua WANG, Youfang LIN, Shuo HE, Yafei LI, Mingliang XU. Adaptive batch matching decision method for carrier-based aircraft support operations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(1): 330615-330615.

Greedy-PPO intelligent spectrum sharing decision for complex electromagnetic interference environments

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 29

Related Articles 15

Recommended Articles

Metrics

Comments