面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策
收稿日期: 2024-01-19
修回日期: 2024-02-05
录用日期: 2024-02-29
网络出版日期: 2024-03-11
基金资助
电磁空间作战与应用重点实验室基金(JJ2021-001)
Greedy-PPO intelligent spectrum sharing decision for complex electromagnetic interference environments
Received date: 2024-01-19
Revised date: 2024-02-05
Accepted date: 2024-02-29
Online published: 2024-03-11
Supported by
Key Laboratory Fund for Electromagnetic Space Operations and Applications(JJ2021-001)
针对复杂电磁环境下的多功能电磁设备用频激烈冲突问题,考虑连续和离散混合动作耦合决策挑战,研究基于强化学习的智能频谱共享技术。首先,考虑己方和干扰方用频规则等多方面因素影响,对复杂电磁干扰环境进行精细化建模,在此基础上,设计多任务需求下雷达通信一体化设备的频谱共享效能评估方法。其次,提出一种Greedy Proximal Policy Optimization(Greedy-PPO)智能频谱共享决策算法,对离散-连续动作空间进行解耦,利用PPO方法最优配置传输功率,基于此,结合Greedy方法求解频谱离散优化分配问题,获得近似最优的联合频谱共享策略。最后,通过仿真实验验证,Greedy-PPO算法相比贪心算法和DDQN算法,总体效能指标可提升48%和15%,具有优良的频谱利用率表现。
殷凯杰 , 石嘉 , 段国栋 , 李立欣 , 司江勃 . 面向复杂电磁干扰环境的Greedy-PPO智能频谱共享决策[J]. 航空学报, 2024 , 45(22) : 330195 -330195 . DOI: 10.7527/S1000-6893.2024.30195
Considering the challenge of continuous and discrete hybrid action coupling decision-making, an intelligent spectrum sharing technology based on reinforcement learning is studied to solve the problem of intense frequency conflict of multi-functional electromagnetic equipment in complex electromagnetic environment. Firstly, considering the influence of many factors such as the frequency rules of the own side and the jamming side, a sophisticated model of the complex electromagnetic interference environment is developed. Based on this, a spectrum sharing efficiency evaluation index for radar communication integrated equipment under multitask requirements is designed. Secondly, a Greedy Proximal Policy Optimization (Greedy-PPO) intelligent spectrum sharing decision algorithm is proposed, which decouples the discrete continuous action space and uses the PPO method to optimize the allocation of transmission power. Then, the Greedy method is employed to solve the problem of spectrum discrete optimization allocation and obtain an approximately optimal joint spectrum sharing strategy. Finally, through simulation experiments, it is verified that the Greedy PPO algorithm can improve the overall performance by 48% and 15% compared to greedy algorithms and DDQN algorithms, respectively, demonstrating excellent performance of spectrum utilization.
1 | 金宁. 美军电磁频谱战理念发展及能力建设现状探析[J]. 军事文摘, 2022, (17): 7-10. |
JIN N. Analysis of the development and capacity building of the US electromagnetic spectrum warfare concept[J]. Military Digest, 2022, (17): 7-10 (in Chinese). | |
2 | 丁国如, 孙佳琛, 王海超, 等. 复杂电磁环境下频谱智能管控技术探讨[J]. 航空学报, 2021, 42(4): 524750. |
DING G R, SUN J C, WANG H C, et al. Discussion on technologies for intelligent spectrum management and control under complex electromagnetic environments[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524750 (in Chinese). | |
3 | 龙晓波, 张圣鹋, 余晨, 等. 复杂适应性系统-电磁频谱战的解决之道[J]. 中国电子科学研究院学报, 2022, 17(11): 1037-1041, 1056. |
LONG X B, ZHANG S M, YU C, et al. Complex adaptive system-the solution of electromagnetic spectrum warfare[J]. Journal of China Academy of Electronics and Information Technology, 2022, 17(11): 1037-1041, 1056 (in Chinese). | |
4 | 刘东, 吴启晖, Tony Q. S. Quek. 面向航空6G的频谱认知智能管控[J]. 物联网学报, 2020, 4(1): 12-18. |
LIU D, WU Q H, TONY Q S Q. Spectrum cognitive intelligent management and control for aviation 6G[J]. Chinese Journal on Internet of Things, 2020, 4(1): 12-18 (in Chinese). | |
5 | 彭沛, 李震. 战场频率管理方法梳理探究[J]. 数字通信世界, 2017(9): 42-43. |
PENG P, LI Z. Exploration and sorting of battlefield frequency management methods[J]. Digital Communication World, 2017(9): 42-43 (in Chinese). | |
6 | 刘鹏, 张国翊, 舒放, 等. 基于图论的认知无线网络频谱动态分配[J]. 电讯技术, 2020, 60(6): 625-631. |
LIU P, ZHANG G Y, SHU F, et al. Dynamic spectrum allocation in cognitive radio networks based on graph theory[J]. Telecommunication Engineering, 2020, 60(6): 625-631 (in Chinese). | |
7 | 周健. 高密度网络中基于图论的快速频谱分配方案研究[D]. 合肥: 合肥工业大学, 2018. |
ZHOU J. Research on fast spectrum allocation scheme based on graph theory in high density network[D]. Hefei: Hefei University of Technology, 2018 (in Chinese). | |
8 | 程启明. 基于改进敏感图着色算法的认知无线电频谱分配研究[D]. 成都: 西南交通大学, 2016. |
CHENG Q M. Research on spectrum allocation of cognitive radio based on improved sensitive graph coloring algorithm[D].Chengdu: Southwest Jiaotong University, 2016 (in Chinese). | |
9 | 韩志豪, 赵东来, 王钢. 超密集网络中基于博弈论的频谱分配策略研究[J]. 无线电工程, 2021, 51(1): 19-24. |
HAN Z H, ZHAO D L, WANG G. Research on spectrum allocation strategy based on game theory in ultra dense network[J]. Radio Engineering, 2021, 51(1): 19-24 (in Chinese). | |
10 | ZHANG L, XIE J L, CHEN Y M. Cognitive spectrum sharing algorithm based on secondary users grouping[C]∥2020 International Conference on Robots & Intelligent System (ICRIS). Piscataway: IEEE Press, 2020: 564-568. |
11 | SUREKHA S, RAHMAN M Z U. Spectrum sensing and allocation strategy for IoT devices using continuous-time Markov chain-based game theory model[J]. IEEE Sensors Letters, 2022, 6(4): 5500504. |
12 | TRAN Q N, VO N S, BUI M P, et al. Spectrum sharing and power allocation optimised multihop multipath D2D video delivery in beyond 5G networks[J]. IEEE Transactions on Cognitive Communications and Networking, 2022, 8(2): 919-930. |
13 | 孙汉卿, 刘征, 王桂芝, 等. 基于多态蚁群优化算法的认知无线电频谱分配[J]. 计算机应用与软件, 2020, 37(12): 260-265, 321. |
SUN H Q, LIU Z, WANG G Z, et al. Cognitive radio spectrum allocation based on improved polymorphic ant colony algorithm[J]. Computer Applications and Software, 2020, 37(12): 260-265, 321 (in Chinese). | |
14 | 赵显煜, 王俊, 邢新华. 基于改进蚁群算法的认知无线电频谱分配的策略研究[J]. 通信技术, 2020, 53(10): 2454-2460. |
ZHAO X Y, WANG J, XING X H. Cognitive radio spectrum allocation strategy based on modified ant colony algorithm[J]. Communications Technology, 2020, 53(10): 2454-2460 (in Chinese). | |
15 | 苏慧慧, 彭艺, 曲文博. 基于疯狂自适应鱼群算法的认知无线电频谱分配[J]. 应用科学学报, 2020, 38(6): 882-889. |
SU H H, PENG Y, QU W B. Cognitive radio spectrum allocation based on crazy adaptive fish swarm algorithm[J]. Journal of Applied Sciences, 2020, 38(6): 882-889 (in Chinese). | |
16 | ZLOBINSKY N, JOHNSON D L, MISHRA A K, et al. Comparison of metaheuristic algorithms for interface-constrained channel assignment in a hybrid dynamic spectrum access-Wi-Fi infrastructure WMN[J]. IEEE Access, 2022, 10: 26654-26680. |
17 | WANG W B, KWASINSKI A, NIYATO D, et al. A survey on applications of model-free strategy learning in cognitive wireless networks[J]. IEEE Communications Surveys & Tutorials, 2016, 18(3): 1717-1757. |
18 | WANG Y H, YE Z F, WAN P, et al. A survey of dynamic spectrum allocation based on reinforcement learning algorithms in cognitive radio networks[J]. Artificial Intelligence Review, 2019, 51(3): 493-506. |
19 | 王倩, 聂秀山, 耿蕾蕾, 等. D2D通信中基于Q学习的联合资源分配与功率控制算法[J]. 南京大学学报(自然科学), 2018, 54(6): 1183-1192. |
WANG Q, NIE X S, GENG L L, et al. Joint resource allocation and power control strategy based on Q-Learning method in cellular D2D network[J]. Journal of Nanjing University (Natural Science), 2018, 54(6): 1183-1192 (in Chinese). | |
20 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533. |
21 | FAN Y X, HUANG J X, WANG X Y, et al. Resource allocation for V2X assisted automotive radar system based on reinforcement learning[C]∥2022 14th International Conference on Wireless Communications and Signal Processing (WCSP). Piscataway: IEEE Press, 2022: 672-676. |
22 | ZHANG Z B, CHANG Q, YANG S Z, et al. Sensing-communication bandwidth allocation in vehicular links based on reinforcement learning[J]. IEEE Wireless Communications Letters, 2023, 12(1): 11-15. |
23 | BIRHANU ENGIDAYEHU S, MAHBOOB T, YOUNG CHUNG M. Deep reinforcement learning-based task offloading and resource allocation in MEC-enabled wireless networks[C]∥ 2022 27th Asia Pacific Conference on Communications (APCC). Piscataway: IEEE Press, 2022: 226-230. |
24 | REN J, XU S. DDPG based computation offloading and resource allocation for MEC systems with energy harvesting[C]∥2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring). Piscataway: IEEE Press, 2021: 1-5. |
25 | 李佳琪. 雷达电磁环境智能认知方法研究[D]. 西安: 西安电子科技大学, 2022. |
LI J Q. Research on radar electromagnetic ambient intelligence cognitive method[D]. Xi’an: Xidian University, 2022 (in Chinese). | |
26 | HUANGI R, SI J B, SHI J, et al. Deep-reinforcement-learning-based resource allocation in ultra-dense network[C]∥2021 13th International Conference on Wireless Communications and Signal Processing (WCSP). Piscataway: IEEE Press, 2021: 1-5. |
27 | 赵嘉荣. 雷达辅助的通信感知一体化关键技术研究[D]. 成都: 电子科技大学, 2023. |
ZHAO J R. Research on key technologies of radar-assisted communication perception integration[D]. Chengdu: University of Electronic Science and Technology of China, 2023 (in Chinese). | |
28 | SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[DB/OL]. arXiv preprint: 1502.05477, 2015. |
29 | XU T Y, ZOU S F, LIANG Y B. Two time-scale off-policy TD learning: Non-asymptotic analysis over Markovian samples[DB/OL]. arXiv preprint: 1909.11907, 2019. |
/
〈 |
|
〉 |