基于对抗强化学习的无人机逃离路径规划方法

doi:10.7527/S1000-6893.2024.31637

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于对抗强化学习的无人机逃离路径规划方法

黄湘松¹^,², 王梦宇¹, 潘大鹏¹^,²()

^1.哈尔滨工程大学信息与通信工程学院，哈尔滨 150001
^2.哈尔滨工程大学先进船舶通信与信息技术工业和信息化部重点实验室，哈尔滨 150001

收稿日期:2024-12-09 修回日期:2025-01-10 接受日期:2025-03-18 出版日期:2025-04-15 发布日期:2025-04-07
通讯作者: 潘大鹏 E-mail:pandapeng@hrbeu.edu.cn
基金资助:
国家自然科学基金(62001136)

Adversarial reinforcement learning-based UAV escape path planning method

Xiangsong HUANG¹^,², Mengyu WANG¹, Dapeng PAN¹^,²()

^1.College of Information And Communication Engineering，Harbin Engineering University，Harbin 150001，China
^2.Key Laboratory of Advanced Marine Communication and Information Technology，Ministry of Industry and Information Technology，Harbin Engineering University，Harbin 150001，China

Received:2024-12-09 Revised:2025-01-10 Accepted:2025-03-18 Online:2025-04-15 Published:2025-04-07
Contact: Dapeng PAN E-mail:pandapeng@hrbeu.edu.cn
Supported by:
National Natural Science Foundation of China(62001136)

摘要/Abstract

摘要：

在无人机技术迅速发展的背景下，如何应对其他无人机的恶意追捕成为了无人机安全防护中的重要课题。针对通过使用对抗强化学习算法，提升无人机在敌对环境中的适应性和生存能力这一问题，利用对抗强化学习框架，针对无人机逃逸过程中接收错误信息对决策产生干扰的问题进行了处理，以围捕者与逃逸者之间的对抗为基础，优化运输无人机的策略以应对围捕者的行为。针对传统的强化学习方法中的稀疏奖励问题，结合人工势场法提出逐步奖励策略机制，使得无人机可以更有效地适应围捕环境。结果表明，该算法相比于近端策略优化（PPO）算法，无人机的逃逸成功率提升了54.47%，同时运输时间减少了34.35%，显著提高了无人机的运输效率。结果为无人机的安全防护提供了新的技术方案，并探索了对抗强化学习在恶意追捕情境下的应用潜力。

关键词: 对抗训练, 强化学习, 逃逸路径规划, 逃逸决策, 奖励函数

Abstract:

In the context of the rapid development of drone technology， how to deal with malicious pursuit by other drones has become an important issue in drone security protection. To address the problem of enhancing a drone’s adaptability and survivability in hostile environments using adversarial reinforcement learning algorithms， this work employs an adversarial reinforcement learning framework. Specifically， it tackles the issue of erroneous information interfering with decision-making during the evasion process. Building upon the adversarial interaction between pursuers and evaders， the strategy of the transport drone is optimized to counter the pursuers’ behavior. To overcome the sparse reward problem inherent in traditional reinforcement learning methods， a progressive reward strategy mechanism incorporating the artificial potential field method is proposed. This enables the drone to adapt more effectively to the pursuit environment. The results demonstrate that， compared to the Proximal Policy Optimization （PPO） algorithm， this algorithm increases the drone’s escape success rate by 54.47% and simultaneously reduces transport time by 34.35%， significantly enhancing the drone’s transport efficiency. These findings provide a new technical solution for drone security protection and explore the application potential of adversarial reinforcement learning in scenarios involving malicious pursuit.

Key words: adversarial training, reinforcement learning, escape path planning, escape decision making, reward function

中图分类号:

V279

黄湘松, 王梦宇, 潘大鹏. 基于对抗强化学习的无人机逃离路径规划方法[J]. 航空学报, 2025, 46(17): 331637.

Xiangsong HUANG, Mengyu WANG, Dapeng PAN. Adversarial reinforcement learning-based UAV escape path planning method[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(17): 331637.

图/表 17

图 1

图 2

图 3

图 4

表1

表2

奖励函数参数

参数	数值
$η$	1
$μ$	1
$ρ 0$	1
$ω 1$	200
$ω 2$	100
$ω 3$	0.3
$ω 4$	0.1

表2

表3

运输无人机性能

参数	含义	数值
m	运输无人机数量	1
$v e m i n /$ （m·s^-1）	运输最小速度	50
$v e m a x /$ （m·s^-1）	运输最大速度	100
$φ e m a x / (°)$	运输无人机最大航偏角	45
$R e$ $/ k m$	感知范围	10
$c e$ $/$ （m·s^-2）	运输无人机最大加速度	7
$ϖ e$ $/$ （（°）·s^-1）	运输无人机最大角速度	7

表3

表4

追捕无人机性能

参数	含义	数值
n	追捕无人机数量	3
$v p m i n /$ （m·s^-1）	追捕最小速度	50
$v p m a x /$ （m·s^-1）	追捕最大速度	110
$φ p m a x$ $/ (°)$	追捕无人机最大航偏角	25
$d p m a x / k m$	攻击范围	100
$c p /$ （m·s^-2）	追捕无人机最大加速度	7
$ϖ p$ $/$ （（°）·s^-1）	追捕无人机最大角速度	7

表4

表5

图 5

图 6

表6

图 7

图 8

图 9

图 10

图 11

参考文献 22

[1]	CHEN B X. Research on AI application in the field of quadcopter UAVs［C］∥2020 IEEE 2nd International Conference on Civil Aviation Safety and Information Technology. Piscataway： IEEE， 2020： 569-571.
[2]	LI B， GAN Z G， CHEN D Q， et al. UAV maneuvering target tracking in uncertain environments based on deep reinforcement learning and meta-learning［J］. Remote Sensing， 2020， 12（22）： 3789.
[3]	LI B， SONG C， BAI S X， et al. Multi-UAV trajectory planning during cooperative tracking based on a fusion algorithm integrating MPC and standoff［J］. Drones， 2023， 7（3）： 196.
[4]	范之琳，杨洪勇，韩艺琳. 基于强化学习的多智能体系统目标围捕控制［J］. 航空学报， 2023， 44（S1）： 727487.
	FAN Z L， YANG H Y， HAN Y L. Target hunting control of multi-agent system based on reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（S1）： 727487 （in Chinese）.
[5]	郭华，郭小和. 改进速度障碍法的无人机局部路径规划算法［J］. 航空学报， 2023， 44（11）： 327586.
	GUO H， GUO X H. Local path planning algorithm for UAV based on improved velocity obstacle method［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（11）： 327586 （in Chinese）.
[6]	赵江，张璇，池沛，等. 空地无人集群自调节控制与动态路径规划方法［J］. 航空学报， 2024， 45（16）： 329809.
	ZHAO J， ZHANG X， CHI P， et al. Self-adaptive formation control and dynamic path planning for air-ground heterogeneous swarm［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（16）： 329809 （in Chinese）.
[7]	黄湘松，于日龙，潘大鹏. 面向目标定位精度的主从式无人机编队航迹规划方法［J］. 电子学报， 2023， 51（9）： 2289-2300.
	HUANG X S， YU R L， PAN D P. Route planning method of master-slave UAV formation for target positioning accuracy［J］. Acta Electronica Sinica， 2023， 51（9）： 2289-2300 （in Chinese）.
[8]	FAN X Y， LI H， CHEN Y， et al. A path-planning method for UAV swarm under multiple environmental threats［J］. Drones， 2024， 8（5）： 171.
[9]	DONG Q X. Reinforcement learning based anti-UAV three-dimensional pursuit-evasion game for substation security［C］∥2024 5th International Conference on Mechatronics Technology and Intelligent Manufacturing （ICMTIM）. Piscataway： IEEE Press， 2024： 224-227.
[10]	MA X H. Application of artificial intelligence in computer network technology［C］∥2023 2nd International Conference on Artificial Intelligence and Autonomous Robot Systems （AIARS）. Piscataway： IEEE Press， 2023： 182-186.
[11]	YU F， ZHANG X， LI Q. Determination of the barrier in the qualitatively pursuit-evasion differential game［C］∥ 2018 IEEE CSAA Guidance， Navigation and Control Conference （CGNCC）. Piscataway： IEEE Press， 2018： 1-6.
[12]	PAN Q， ZHOU D Y， HUANG J C， et al. Maneuver decision for cooperative close-range air combat based on state predicted influence diagram［C］∥2017 IEEE International Conference on Information and Automation （ICIA）. Piscataway： IEEE Press， 2017： 726-731.
[13]	傅莉，谢福怀，孟光磊，等. 基于滚动时域的无人机空战决策专家系统［J］. 北京航空航天大学学报， 2015， 41（11）： 1994-1999.
	FU L， XIE F H， MENG G L， et al. An UAV air-combat decision expert system based on receding horizon control［J］. Journal of Beijing University of Aeronautics and Astronautics， 2015， 41（11）： 1994-1999 （in Chinese）.
[14]	张耀中，许佳林，姚康佳，等. 基于DDPG算法的无人机集群追击任务［J］. 航空学报， 2020， 41（10）： 324000.
	ZHANG Y， XU J， YAO K， et al. Pursuit missions for UAV swarms based on DDPG algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（10）： 324000 （in Chinese）.
[15]	KOUZEGHAR M， SONG Y， MEGHJANI M， et al. Multi-target pursuit by a decentralized heterogeneous UAV swarm using deep multi-agent reinforcement learning［C］∥2023 IEEE International Conference on Robotics and Automation （ICRA）. Piscataway： IEEE Press， 2023： 3289-3295.
[16]	符小卫，徐哲，朱金冬，等. 基于PER-MATD3的多无人机攻防对抗机动决策［J］. 航空学报， 2023， 44（7）： 327083.
	FU X W， XU Z， ZHU J D， et al. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（7）： 327083 （in Chinese）.
[17]	XIONG H， ZHANG Y. Reinforcement learning-based formation-surrounding control for multiple quadrotor UAVs pursuit-evasion games［J］. ISA Transactions， 2024， 145： 205-224.
[18]	GUO Y， ZHANG N Z， JIANG H R， et al. Layered reinforcement learning design for safe flight control of UAV in urban environments［C］∥2023 International Annual Conference on Complex Systems and Intelligent Science （CSIS-IAC）. Piscataway： IEEE Press， 2023： 673-678.
[19]	WANG J， XIAO Y， LI T S， et al. A jamming aware artificial potential field method to counter GPS jamming for unmanned surface ship path planning［J］. IEEE Systems Journal， 2023， 17（3）： 4555-4566.
[20]	SHRIVASTAVA A， PFISTER T， TUZEL O， et al. Learning from simulated and unsupervised images through adversarial training［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 2242-2251.
[21]	杨振，李琳，柴仕元，等. 面向多战术需求的无人机空战自主规避机动方法［J］. 航空学报， 2024， 45（20）： 630629.
	YANG Z， LI L， CHAI S Y， et al. Autonomous evasive maneuver method for unmanned combat aerial vehicle in air combat with multiple tactical requirements［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（20）： 630629 （in Chinese）.
[22]	CESA-BIANCHI N， CONCONI A， GENTILE C. On the generalization ability of on-line learning algorithms［J］. IEEE Transactions on Information Theory， 2004， 50（9）： 2050-2057.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

参数	含义	数值
γ	奖励折扣因子	0.99
α	神经网络学习率	0.003
ε	剪切参数	0.2
lr_c	价值网络参数	0.003
λ	GAE	0.95
N_{max_episode_steps}	单回合最大时间步数	1 000
T/s	时间步长	0.5
N_{batch_size}	批次大小（Mini-batch）	2 048

算法	平滑度	奖励函数最大值	运行时间/h
DQN	4.06	2.57	1.38
PPO	3.78	3.02	0.57
R-DQN	4.32	3.67	2.99
R-PPO	4.53	4.47	3.22
ADR-PPO	4.64	4.49	3.42

场景	算法	成功率/%	运输成功时平均飞行时间/s
无干扰	DQN	51.7	355.98
	PPO	69.5	292.54
	R-PPO	81.0	176.87
	ADR-PPO	81.2	173.25
有干扰	DQN	36.9	392.81
	PPO	51.4	328.44
	R-PPO	66.7	178.01
	ADR-PPO	79.4	175.62

基于对抗强化学习的无人机逃离路径规划方法

Adversarial reinforcement learning-based UAV escape path planning method

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 22

相关文章 15

编辑推荐

Metrics

本文评价

[1]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[2]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[3]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[4]	王昱, 谢志鹏, 田永健, 孟光磊. 虚拟结构引领强化学习分布式无人机编队控制[J]. 航空学报, 2025, 46(15): 331354-331354.
[5]	陈伟, 李璐璐, 陈董, 张少辉, 李亚飞, 王可, 靳远远, 徐明亮. 差异化保障需求驱动的舰载机多机协同决策方法[J]. 航空学报, 2025, 46(13): 531274-531274.
[6]	陈旭东, 陈琦琦, 罗祎喆, 王佳宝, 徐明亮. 异构舰载机舰面保障作业动态并行调度[J]. 航空学报, 2025, 46(13): 531329-531329.
[7]	王政, 王华, 崔可可, 李超超, 刘俊楠, 徐明亮. 局部引导强化学习的舰载机自主调运方法[J]. 航空学报, 2025, 46(13): 531333-531333.
[8]	凌文辉, 牟春晖, 聂聆聪, 杜宪, 孙希明. 基于改进DDPG的宽速域几何可调燃烧室压力分布控制[J]. 航空学报, 2025, 46(12): 131092-131092.
[9]	余子杰, 郑征, 李清东, 郭林, 任素萍, 郭健. 基于深度强化学习的太阳能无人机航迹规划[J]. 航空学报, 2025, 46(12): 331420-331420.
[10]	赵长啸, 孙亦轩. 面向适航要求的eVTOL航电系统安全调度模型[J]. 航空学报, 2025, 46(11): 531252-531252.
[11]	高树一, 林德福, 郑多, 徐骋. 考虑拦截器探测能力限制的飞行器智能机动突防制导策略[J]. 航空学报, 2025, 46(10): 331304-331304.
[12]	刘广, 王华, 林友芳, 贺硕, 李亚飞, 徐明亮. 舰载机保障作业自适应批量匹配决策方法[J]. 航空学报, 2025, 46(1): 330615-330615.
[13]	张鸿林, 罗建军, 马卫华. 基于机器学习的航天器规避目标威胁博弈决策[J]. 航空学报, 2024, 45(8): 329136-329136.
[14]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[15]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.