Articles

Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network

  • Wentao LI ,
  • Feng FANG ,
  • Zhenya WANG ,
  • Yichao ZHU ,
  • Dongliang PENG
Expand
  • 1.School of Automation,Hangzhou Dianzi University,Hangzhou  310018,China
    2.China Academy of Aerospace Science and Innovation,Beijing  100076,China
E-mail: fangf@hdu.edu.cn

Received date: 2023-08-18

  Revised date: 2023-09-26

  Accepted date: 2023-10-24

  Online published: 2023-11-01

Supported by

the Fundamental Research Funds for the Provincial Universities of Zhejiang(GK209907299001-021)

Abstract

In the case of two Unmanned Combat Aerial Vehicles (UCAVs) cooperative air combat with local observation, there are the problems such as hard-to-design collaborative rewards, low collaboration efficiency and poor decision-making effect. To solve these problems, an intelligent maneuver decision-making method is proposed based on the improved the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) with hybrid hyper network. A Centralized Training with Decentralized Execution (CTDE) architecture is adopted to meet the training requirements of global coordinated maneuvering decision in the situation of single agent with local observation. A reward function is designed for the UCAV agent by considering both the local reward of fast guidance for obtaining attack advantage and the global reward for winning air combat. Then, a hybrid hyper network is introduced to mix the estimated Q values of each agent monotonically and nonlinearly to obtain the global policy value function. By using the global policy value function, the decentralized Actor network update parameters to solve the problem of credit assignment in multi-agent deep reinforcement learning. Simulation results show that compared with the traditional MADDPG method, the proposed method can produce the optimal global cooperative maneuver commands for achieving better coordination performance, and can obtain a higher winning rate with the same agent opponent.

Cite this article

Wentao LI , Feng FANG , Zhenya WANG , Yichao ZHU , Dongliang PENG . Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(17) : 529460 -529460 . DOI: 10.7527/S1000-6893.2023.29460

References

1 范晋祥, 陈晶华. 未来空战新概念及其实现挑战[J]. 航空兵器202027(2): 15-24.
  FAN J X, CHEN J H. New concepts of future air warfare and the challenges for its realization[J]. Aero Weaponry202027(2): 15-24 (in Chinese).
2 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报202142(8): 525799.
  SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525799 (in Chinese).
3 徐光达, 吕超, 王光辉, 等. 基于双矩阵对策的UCAV空战自主机动决策研究[J]. 舰船电子工程201737(11): 24-28, 39.
  XU G D, LV C, WANG G H, et al. Research on UCAV autonomous air combat maneuvering decision-making based on Bi-matrix game[J]. Ship Electronic Engineering201737(11): 24-28, 39 (in Chinese).
4 邵将, 徐扬, 罗德林. 无人机多机协同对抗决策研究[J]. 信息与控制201847(3): 347-354.
  SHAO J, XU Y, LUO D L. Cooperative combat decision-making research for multi UAVs[J]. Information and Control201847(3): 347-354 (in Chinese).
5 黄长强, 赵克新, 韩邦杰, 等. 一种近似动态规划的无人机机动决策方法[J]. 电子与信息学报201840(10): 2447-2452.
  HUANG C Q, ZHAO K X, HAN B J, et al. Maneuvering decision-making method of UAV based on approximate dynamic programming[J]. Journal of Electronics & Information Technology201840(10): 2447-2452 (in Chinese).
6 谢建峰, 杨啟明, 戴树岭, 等. 基于强化遗传算法的无人机空战机动决策研究[J]. 西北工业大学学报202038(6): 1330-1338.
  XIE J F, YANG Q M, DAI S L, et al. Air combat maneuver decision based on reinforcement genetic algorithm[J]. Journal of Northwestern Polytechnical University202038(6): 1330-1338 (in Chinese).
7 李伟东, 黄振柱, 何精武, 等. 改进行为克隆与DDPG的无人驾驶决策模型[J/OL]. 计算机工程与应用, [2023-07-06] (2023-08-18). .
  LI W D, HUANG Z Z, HE J W, et al. Improved behavioral cloning and DDPG’s driverless decision model[J/OL]. Computer Engineering and Applications, [2023-07-06] (2023-08-18). (in Chinese).
8 BOTVINICK M, WANG J X, DABNEY W, et al. Deep reinforcement learning and its neuroscientific implications[J]. Neuron2020107(4): 603-616.
9 SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature2016529: 484-489.
10 THERESA H. DARPA’s AlphaDogfight tests AI pilot’s combat chops[EB/OL]. (2020-08) [2023-08-18]. , 2020.
11 马文, 李辉, 王壮, 等. 基于深度随机博弈的近距空战机动决策[J]. 系统工程与电子技术202143(2): 443-451.
  MA W, LI H, WANG Z, et al. Close air combat maneuver decision based on deep stochastic game[J]. Systems Engineering and Electronics202143(2): 443-451 (in Chinese).
12 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报202344(4): 126731.
  ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(4): 126731 (in Chinese).
13 左家亮, 杨任农, 张滢, 等. 基于启发式强化学习的空战机动智能决策[J]. 航空学报201738(10): 321168.
  ZUO J L, YANG R N, ZHANG Y, et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica201738(10): 321168 (in Chinese).
14 KONG W R, ZHOU D Y, YANG Z, et al. UAV autonomous aerial combat maneuver strategy generation with observation error based on state-adversarial deep deterministic policy gradient and inverse reinforcement learning[J]. Electronics20209(7): 1121.
15 高敬鹏, 王国轩, 高路. 基于异步合作更新的LSTM-MADDPG多智能体协同决策算法[J]. 吉林大学学报(工学版)202454(3): 797-806.
  GAO J P, WANG G X, GAO L. LSTM-MADDPG multi-agent cooperative decision algorithm based on asynchronous collaborative update[J]. Journal of Jilin University (Engineering and Technology Edition)202454(3): 797-806 (in Chinese).
16 孔维仁, 周德云, 赵艺阳, 等. 基于深度强化学习与自学习的多无人机近距空战机动策略生成算法[J]. 控制理论与应用202239(2): 352-362.
  KONG W R, ZHOU D Y, ZHAO Y Y, et al. Maneuvering strategy generation algorithm for multi-UAV in close-range air combat based on deep reinforcement learning and self-play[J]. Control Theory & Applications202239(2): 352-362 (in Chinese).
17 邓可, 彭宣淇, 周德云. 基于矩阵对策与遗传算法的无人机空战决策[J]. 火力与指挥控制201944(12): 61-66, 71.
  DENG K, PENG X Q, ZHOU D Y. Study on air combat decision method of UAV based on matrix game and genetic algorithm[J]. Fire Control & Command Control201944(12): 61-66, 71 (in Chinese).
18 LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[C]∥Proceedings of the 31st International Conference on Neural Information Processing Systems. New York: ACM, 2017: 6382-6393.
19 施伟, 冯旸赫, 程光权, 等. 基于深度强化学习的多机协同空战方法研究[J]. 自动化学报202147(7): 1610-1623.
  SHI W, FENG Y H, CHENG G Q, et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning[J]. Acta Automatica Sinica202147(7): 1610-1623 (in Chinese).
20 RASHID T, SAMVELYAN M, DE WITT C S, et al. Monotonic value function factorisation for deep multi-agent reinforcement learning[J]. The Journal of Machine Learning Research202021(1): 7234-7284.
Outlines

/