航空学报 > 2020, Vol. 41 Issue (12): 324152-324152   doi: 10.7527/S1000-6893.2020.24152

非对称机动能力多无人机智能协同攻防对抗

陈灿1,2, 莫雳1,2, 郑多1,2, 程子恒1,2, 林德福1,2   

  1. 1. 北京理工大学 宇航学院, 北京 100081;
    2. 北京理工大学 无人机自主控制技术北京市重点实验室, 北京 100081
  • 收稿日期:2020-04-29 修回日期:2020-05-22 发布日期:2020-06-24
  • 通讯作者: 郑多 E-mail:zhengduohello@126.com
  • 基金资助:
    国家自然科学基金(61903350);国家自然科学基金重点项目(U1613225)

Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability

CHEN Can1,2, MO Li1,2, ZHENG Duo1,2, CHENG Ziheng1,2, LIN Defu1,2   

  1. 1. School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China;
    2. Beijing Key Laboratory of UAV Autonomous Control, Beijing Institute of Technology, Beijing 100081, China
  • Received:2020-04-29 Revised:2020-05-22 Published:2020-06-24
  • Supported by:
    National Natural Science Foundation of China (61903350); Key Program of National Natural Science Foundation of China (U1613225)

摘要: 协同攻防对抗是未来军用无人机的重要作战场景。针对不同机动能力无人机群体间的攻防对抗问题,建立了多无人机协同攻防演化模型,基于多智能体强化学习理论,研究了多无人机协同攻防的自主决策方法,提出了基于执行-评判(Actor-Critic)算法的集中式评判和分布式执行的算法结构,保证算法稳定收敛的同时,提升执行效率。无人机的评判模块使用全局信息评价决策优劣引导策略学习,而执行时只需要依赖局部感知信息进行自主决策,提高了多机攻防对抗的效能。仿真结果表明,所提的多无人机强化学习方法具备较强的自进化属性,赋予了无人机一定智能,即稳定的自主学习能力,通过不断演化,能自主学习提升协同对抗的决策效能。

关键词: 多无人机协同, 攻防对抗, 强化学习, 集中式评判, 分布式执行

Abstract: The attack-defense game is an important combat scenario of future military Unmanned Aerial Vehicles (UAVs). This paper studies an attack-defense game between groups of UAVs with different maneuverability, establishing a multi-UAV cooperative attack and defense evolution model. Based on the multi-agent reinforcement learning theory, the autonomous decision-making method of multi-UAV cooperative attack-defense game is studied, and a centralized critic and distributed actor algorithm structure is proposed based on the actor-critic algorithm, guaranteeing the convergence of the algorithm and improving the efficiency of decision-making. The critic module of UAVs uses the global information to evaluate the decision-making quality during training, while the actor module only needs to rely on the local perception information to make autonomous decisions during execution, hence improving the effectiveness of the multi-UAV attack-defense game. The simulation results show that the proposed multi-UAV reinforcement learning method has a strong self-evolution property, endowing the UAV certain intelligence, that is, the stable autonomous learning ability. Through continuous training, the UAVs can autonomously learn cooperative attack or defense policies to improve the effectiveness of decision-making.

Key words: multi-UAV coordination, attack-defense games, reinforcement learning, centralized critic, distributed actors

中图分类号: