航空学报 > 2025, Vol. 46 Issue (17): 331637-331637   doi: 10.7527/S1000-6893.2024.31637

基于对抗强化学习的无人机逃离路径规划方法

黄湘松1,2, 王梦宇1, 潘大鹏1,2()   

  1. 1.哈尔滨工程大学 信息与通信工程学院,哈尔滨 150001
    2.哈尔滨工程大学 先进船舶通信与信息技术工业和信息化部重点实验室,哈尔滨 150001
  • 收稿日期:2024-12-09 修回日期:2025-01-10 接受日期:2025-03-18 出版日期:2025-04-15 发布日期:2025-04-07
  • 通讯作者: 潘大鹏 E-mail:pandapeng@hrbeu.edu.cn
  • 基金资助:
    国家自然科学基金(62001136)

Adversarial reinforcement learning-based UAV escape path planning method

Xiangsong HUANG1,2, Mengyu WANG1, Dapeng PAN1,2()   

  1. 1.College of Information And Communication Engineering,Harbin Engineering University,Harbin 150001,China
    2.Key Laboratory of Advanced Marine Communication and Information Technology,Ministry of Industry and Information Technology,Harbin Engineering University,Harbin 150001,China
  • Received:2024-12-09 Revised:2025-01-10 Accepted:2025-03-18 Online:2025-04-15 Published:2025-04-07
  • Contact: Dapeng PAN E-mail:pandapeng@hrbeu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62001136)

摘要:

在无人机技术迅速发展的背景下,如何应对其他无人机的恶意追捕成为了无人机安全防护中的重要课题。针对通过使用对抗强化学习算法,提升无人机在敌对环境中的适应性和生存能力这一问题,利用对抗强化学习框架,针对无人机逃逸过程中接收错误信息对决策产生干扰的问题进行了处理,以围捕者与逃逸者之间的对抗为基础,优化运输无人机的策略以应对围捕者的行为。针对传统的强化学习方法中的稀疏奖励问题,结合人工势场法提出逐步奖励策略机制,使得无人机可以更有效地适应围捕环境。结果表明,该算法相比于近端策略优化(PPO)算法,无人机的逃逸成功率提升了54.47%,同时运输时间减少了34.35%,显著提高了无人机的运输效率。结果为无人机的安全防护提供了新的技术方案,并探索了对抗强化学习在恶意追捕情境下的应用潜力。

关键词: 对抗训练, 强化学习, 逃逸路径规划, 逃逸决策, 奖励函数

Abstract:

In the context of the rapid development of drone technology, how to deal with malicious pursuit by other drones has become an important issue in drone security protection. To address the problem of enhancing a drone’s adaptability and survivability in hostile environments using adversarial reinforcement learning algorithms, this work employs an adversarial reinforcement learning framework. Specifically, it tackles the issue of erroneous information interfering with decision-making during the evasion process. Building upon the adversarial interaction between pursuers and evaders, the strategy of the transport drone is optimized to counter the pursuers’ behavior. To overcome the sparse reward problem inherent in traditional reinforcement learning methods, a progressive reward strategy mechanism incorporating the artificial potential field method is proposed. This enables the drone to adapt more effectively to the pursuit environment. The results demonstrate that, compared to the Proximal Policy Optimization (PPO) algorithm, this algorithm increases the drone’s escape success rate by 54.47% and simultaneously reduces transport time by 34.35%, significantly enhancing the drone’s transport efficiency. These findings provide a new technical solution for drone security protection and explore the application potential of adversarial reinforcement learning in scenarios involving malicious pursuit.

Key words: adversarial training, reinforcement learning, escape path planning, escape decision making, reward function

中图分类号: