航空学报 > 2023, Vol. 44 Issue (7): 327083-327083   doi: 10.7527/S1000-6893.2022.27083

基于PER-MATD3的多无人机攻防对抗机动决策

符小卫1(), 徐哲2, 朱金冬1,3, 王楠1   

  1. 1.西北工业大学 电子信息学院,西安 710129
    2.西安应用光学研究所,西安 710065
    3.航空工业沈阳飞机设计研究所 体系部,沈阳 110035
  • 收稿日期:2022-02-28 修回日期:2022-03-23 接受日期:2022-05-11 出版日期:2023-04-15 发布日期:2022-05-19
  • 通讯作者: 符小卫 E-mail:fxw@nwpu.edu.cn
  • 基金资助:
    航空科学基金(2020Z023053001)

Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3

Xiaowei FU1(), Zhe XU2, Jindong ZHU1,3, Nan WANG1   

  1. 1.School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710129,China
    2.Xi’an Institute of Applied Optics,Xi’an 710065,China
    3.AVIC Shenyang Aircraft Design Research Institute,Shenyang 110035,China
  • Received:2022-02-28 Revised:2022-03-23 Accepted:2022-05-11 Online:2023-04-15 Published:2022-05-19
  • Contact: Xiaowei FU E-mail:fxw@nwpu.edu.cn
  • Supported by:
    Aeronautical Science Foundation of China(2020Z023053001)

摘要:

以障碍物随机分布的复杂环境下多无人机攻防对抗机动决策为研究背景,构建了攻防双方运动模型及雷达探测模型,将双延迟深度确定性策略梯度(TD3)算法扩展到多智能体领域中以解决多智能体深度确定性策略梯度(MADDPG)算法存在值函数高估的问题;在此基础上,为了提升算法学习效率,结合优先经验回放机制提出了优先经验回放多智能体双延迟深度确定性策略算法(PER-MATD3)。通过仿真实验表明本文所设计的方法在多无人机攻防对抗机动决策问题中具有较好的对抗效果,并通过对比验证了(PER-MATD3)算法相较其他算法在收敛速度和稳定性方面的优势。

关键词: 多无人机, 多智能体强化学习, PER-MATD3, 攻防对抗, 机动决策

Abstract:

This paper explores multi-UAVs attack-defence confrontation maneuvering decision-making in a complex environment with random distribution of obstacles. A motion model and a radar detection model for both attack and defence sides are constructed. the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm is extended to the multi-agent field to solve the problem of overestimation of the value function in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. To improve the learning efficiency of the algorithm, a Prioritized Experience Replay Multi-Agent Twin Delayed Deep Deterministic policy gradient (PER-MATD3) algorithm is proposed based on the priority experience playback mechanism. The simulation experiments show that the method proposed in this paper has a good confrontation effect in multi-UAV attack-defence confrontation maneuvering decision making, and the advantages of the PER-MATD3 algorithm over other algorithms in terms of convergence speed and stability are verified by comparison.

Key words: multi-UAVs, multi-agent reinforcement learning, PER-MATD3, attack-defence confrontation, maneuvering decision-making

中图分类号: