Electronics and Electrical Engineering and Control

Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3

  • Xiaowei FU ,
  • Zhe XU ,
  • Jindong ZHU ,
  • Nan WANG
Expand
  • 1.School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710129,China
    2.Xi’an Institute of Applied Optics,Xi’an 710065,China
    3.AVIC Shenyang Aircraft Design Research Institute,Shenyang 110035,China
E-mail: fxw@nwpu.edu.cn

Received date: 2022-02-28

  Revised date: 2022-03-23

  Accepted date: 2022-05-11

  Online published: 2022-05-19

Supported by

Aeronautical Science Foundation of China(2020Z023053001)

Abstract

This paper explores multi-UAVs attack-defence confrontation maneuvering decision-making in a complex environment with random distribution of obstacles. A motion model and a radar detection model for both attack and defence sides are constructed. the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm is extended to the multi-agent field to solve the problem of overestimation of the value function in the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. To improve the learning efficiency of the algorithm, a Prioritized Experience Replay Multi-Agent Twin Delayed Deep Deterministic policy gradient (PER-MATD3) algorithm is proposed based on the priority experience playback mechanism. The simulation experiments show that the method proposed in this paper has a good confrontation effect in multi-UAV attack-defence confrontation maneuvering decision making, and the advantages of the PER-MATD3 algorithm over other algorithms in terms of convergence speed and stability are verified by comparison.

Cite this article

Xiaowei FU , Zhe XU , Jindong ZHU , Nan WANG . Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023 , 44(7) : 327083 -327083 . DOI: 10.7527/S1000-6893.2022.27083

References

1 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报202142(8): 525799.
  SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525799 (in Chinese).
2 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报202041(S1): 4-14.
  JIA Y N, TIAN S Y, LI Q. Recent development of unmanned aerial vehicle swarms[J]. Acta Aeronautica et Astronautica Sinica202041(S1): 4-14 (in Chinese).
3 李兆强, 周德云. 无人机数据链变结构对抗攻击导引方式研究[J]. 系统仿真学报200820(13): 3507-3509, 3513.
  LI Z Q, ZHOU D Y. UAV data links variable structure against attacks guidance law research[J]. Journal of System Simulation200820(13): 3507-3509, 3513 (in Chinese).
4 田亚卓, 张勇军. 基于改进人工势场法的动态环境下无人机路径规划[J]. 武汉科技大学学报201740(6): 451-456.
  TIAN Y Z, ZHANG Y J. UAV path planning based on improved artificial potential field in dynamic environment[J]. Journal of Wuhan University of Science and Technology201740(6): 451-456 (in Chinese).
5 FANG B F, PAN Q S, HONG B R, et al. Research on high speed evader vs. multi lower speed pursuers in multi pursuit-evasion games[J]. Information Technology Journal201211(8): 989-997.
6 谢剑. 基于微分博弈论的多无人机追逃协同机动技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2015, 32-45.
  XIE J. Differential game theory for multi UAV pursuit maneuver technology based on collaborative research[D]. Harbin: Harbin Institute of Technology, 2015,32-45. (in Chinese).
7 WEINTRAUB I, GARCIA E, PACHTER M. Optimal guidance strategy for the defense of a non‐manoeuvrable target in 3‐dimensions[J]. IET Control Theory & Applications202014(11): 1531-1538.
8 张国锋, 周凯. 基于改进鱼群算法的无人机智能突防[J]. 控制工程201926(5): 922-926.
  ZHANG G F, ZHOU K. Intelligent penetration for UAV based on improved artificial fish swarm algorithm (AFSA)[J]. Control Engineering of China201926(5): 922-926 (in Chinese).
9 HUO Z X, DAI S L, YUAN M X, et al. A reinforcement learning based multiple strategy framework for tracking a moving target[C]∥2020 IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM). Piscataway: IEEE Press, 2020: 1292-1297.
10 陈灿, 莫雳, 郑多, 等. 非对称机动能力多无人机智能协同攻防对抗[J]. 航空学报202041(12): 324152.
  CHEN C, MO L, ZHENG D, et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability[J]. Acta Aeronautica et Astronautica Sinica202041(12): 324152 (in Chinese).
11 马俊冲. 基于多机器人系统的多目标围捕协同控制问题研究[D]. 长沙: 国防科技大学, 2018,43-52.
  MA J C. Research on encirclement control for a group of targets by multi-robot system[D]. Changsha: National University of Defense Technology, 2018,43-52 (in Chinese).
12 符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略研究[J]. 航空学报, 2022:,43(5):325311.
  FU X W, WANG H, XU Z. Research on cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm[J]. Acta Aeronautica et Astronautica Sinica202243(5):325311 (in Chinese).
13 李波, 越凯强, 甘志刚, 等. 基于MADDPG的多无人机协同任务决策[J]. 宇航学报202142(6): 757-765.
  LI B, YUE K Q, GAN Z G, et al. Multi-UAV cooperative autonomous navigation based on multi-agent deep deterministic policy gradient[J]. Journal of Astronautics202142(6): 757-765 (in Chinese).
14 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真研究[J]. 航空学报202344(4): 126731.
  ZHOU P, HUANG J T, ZHANG S, et al. Research on UAV intelligent air combat decision and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(4): 126731 (in Chinese).
15 HU Z J, GAO X G, WAN K F, et al. Relevant experience learning: A deep reinforcement learning method for UAV autonomous motion planning in complex unknown environments[J]. Chinese Journal of Aeronautics202134(12): 187-204.
16 BAI S X, SONG S M, LIANG S Y, et al. UAV maneuvering decision-making algorithm based on twin delayed deep deterministic policy gradient algorithm[J]. Journal of Artificial Intelligence and Technology20222(1): 16-22.
17 ZHANG S T. Autonomous navigation of UAV in multi-obstacle environments based on a deep reinforcement learning approach[J]. Applied Soft Computing2022115: 108194.
18 郭万春, 解武杰, 尹晖, 等. 基于改进双延迟深度确定性策略梯度法的无人机反追击机动决策[J]. 空军工程大学学报(自然科学版)202122(4): 15-21.
  GUO W C, XIE W J, YIN H, et al. Research on UAV anti-pursing maneuvering decision based on improved twin delayed deep deterministic policy gradient method[J]. Journal of Air Force Engineering University (Natural Science Edition)202122(4): 15-21 (in Chinese).
19 李文浩. 去中心化多智能体强化学习算法研究[D]. 上海: 华东师范大学, 2019: 45-51.
  LI W H. Decentralized multi-agent reinforcement learning algorithm research[D]. Shanghai: East China Normal University, 2019: 45-51 (in Chinese).
20 XIANG L, XIE T. Research on UAV swarm confrontation task based on MADDPG algorithm[C]∥2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE). Piscataway: IEEE Press, 2020: 1513-1518.
21 黄利伟. 智能协同算法研究及应用[D]. 成都: 电子科技大学, 2019: 31-40.
  HUANG L W. Research and application of the intelligent collaboration algorithms[D]. Chengdu: University of Electronic Science and Technology of China, 2019: 31-40 (in Chinese).
22 LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[DB/OL].arXiv preprint: 1706.02275,2017.
23 FUJIMOTO S, van HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. arXiv preprint1802.09477, 2018.
24 ZHANG F J. A TD3-based multi-agent deep reinforcement learning method in mixed cooperation-competition environment[J]. Neurocomputing2020411: 206-215.
25 SUI D, XU W P, ZHANG K. Study on the resolution of multi-aircraft flight conflicts based on an IDQN[J]. Chinese Journal of Aeronautics202235(2): 195-213.
26 高昂, 董志明, 李亮, 等. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术202143(2): 420-433.
  GAO A, DONG Z M, LI L, et al. Parallel priority experience replay mechanism of MADDPG algorithm[J]. Systems Engineering and Electronics202143(2): 420-433 (in Chinese).
27 FU X W, ZHU J D, WEI Z Y, et al. A UAV pursuit-evasion strategy based on DDPG and imitation learning[J]. International Journal of Aerospace Engineering20222022: 3139610.
28 赵毓, 管公顺, 郭继峰, 等. 基于多智能体强化学习的空间机械臂轨迹规划[J]. 航空学报202142(1): 524151.
  ZHAO Y, GUAN G S, GUO J F, et al. Trajectory planning of space manipulator based on multi-agent reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202142(1): 524151 (in Chinese).
Outlines

/