航空学报 > 2022, Vol. 43 Issue (5): 325311-325311   doi: 10.7527/S1000-6893.2021.25311

基于DE-MADDPG的多无人机协同追捕策略

符小卫, 王辉, 徐哲   

  1. 西北工业大学 电子信息学院, 西安 710129
  • 收稿日期:2021-01-22 修回日期:2021-03-06 发布日期:2021-03-26
  • 通讯作者: 符小卫 E-mail:fxw@nwpu.edu.cn
  • 基金资助:
    航空科学基金(202023053001)

Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm

FU Xiaowei, WANG Hui, XU Zhe   

  1. School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
  • Received:2021-01-22 Revised:2021-03-06 Published:2021-03-26
  • Supported by:
    Aeronautical Science Foundation of China (202023053001)

摘要: 针对多无人机协同对抗快速目标的追逃博弈问题,研究了多无人机的协同追捕策略。基于解耦多智能体深度确定性策略梯度算法DE-MADDPG研究了多无人机协同对抗快速目标的追捕策略,设计了多无人机协同追捕的全局奖励和局部奖励两种奖励函数,训练后的多无人机能够有效地执行协同追捕任务。通过设置快速目标的多种逃逸控制策略,仿真验证了所设计的方法能够利用追捕无人机的数量优势,通过协作完成对快速目标的协同围捕,并且通过比较,验证本文所提出的算法相比MADDPG算法更快地取得了收敛效果。

关键词: 多无人机, 协同追捕, DE-MADDPG, 多智能体强化学习, 对抗策略

Abstract: To solve the problem of pursuit-evasion game in multi-UAVs confronting the fast target, we study the cooperative pursuit strategy of multi-UAVs. We train the strategy using the DE composed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG) algorithm, and design two reward functions:global reward function, and local reward function. The trained multi-UAVs can effectively carry out the cooperative pursuit mission. Simulation results show the effectiveness of the proposed method. The multi-UAVs can take advantage of numbers and cooperative work to complete a rounding up of the fast target. It is also verified that the proposed method can achieve faster convergence effect than the basic MADDPG algorithm.

Key words: multi-UAVs, cooperative pursuit, DE-MADDPG, multi-agent deep reinforcement learning, confront strategy

中图分类号: