Articles

Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning

  • Zuolong LI ,
  • Jihong ZHU ,
  • Minchi KUANG ,
  • Jie ZHANG ,
  • Jie REN
Expand
  • 1.Department of Precision Instrument,Tsinghua University,Beijing 100084,China
    2.AVIC Chengdu Flight Design and Research Institute,Chengdu 610091,China

Received date: 2024-01-02

  Revised date: 2024-01-11

  Accepted date: 2024-04-22

  Online published: 2024-04-25

Abstract

Intelligent air combat is a hot research topic among countries with strong military power in the world. To solve the maneuver decision problem of air combat Beyond Visual Range (BVR), we propose the hierarchical decision algorithm based on deep reinforcement learning. In the decision algorithm, we use the maneuver set appropriate to the BVR air combat to control the trajectory and the attitude of the aircraft. To expand the action space of the model and increase its decision-making ability, we hierarchize the action space and model it as the multi-discrete one. To solve the problem of sparse reward in air combat, we design a set of reward function taking into consideration the factors including the position advantage, weapon launching, and weapon threat, which can guide the agent to converge to the optimal policy. We also build a complete digital-twin simulation environment for air combat and an expert system. The decision algorithm is trained in the simulation environment, and is evaluated by fighting with the expert system. The experiment results indicate that the decision algorithm proposed has the ability to make autonomous and flexible decisions in BVR air combat based on current situations, and has some advantages against the expert system.

Cite this article

Zuolong LI , Jihong ZHU , Minchi KUANG , Jie ZHANG , Jie REN . Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(17) : 530053 -530053 . DOI: 10.7527/S1000-6893.2024.30053

References

1 喻煌超, 牛轶峰, 王祥科. 无人机系统发展阶段和智能化趋势[J]. 国防科技202142(3): 18-24.
  YU H C, NIU Y F, WANG X K. Stages of development of Unmanned Aerial Vehicles[J]. National Defense Technology202142(3): 18-24 (in Chinese).
2 ERNEST N, CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J]. Journal of Defense Management20166(1): 1000144.
3 POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥ 2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284.
4 DINARDO G. Artificial intelligence flies XQ-58A Valkyrie drone [EB/OL] (2023-08-03)[2023-12-15]. .
5 赵志忠, 高正红, 刘行伟, 等. 用攻击点推移速率评估一对一超视距空战效能[J]. 系统仿真学报200517(12): 2855-2857, 2862.
  ZHAO Z Z, GAO Z H, LIU X W, et al. Using shooting point stepping pace for evaluating one-versus-one BVR combat effectiveness[J]. Acta Simulata Systematica Sinica200517(12): 2855-2857, 2862 (in Chinese).
6 杜海文, 崔明朗, 韩统, 等. 基于多目标优化与强化学习的空战机动决策[J]. 北京航空航天大学学报201844(11): 2247-2256.
  DU H W, CUI M L, HAN T, et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics201844(11): 2247-2256 (in Chinese).
7 AUSTIN F, CARBONE G, FALCO M, et al. Automated maneuvering decisions for air-to-air combat[C]∥ Proceedings of the Guidance, Navigation and Control Conference. Reston: AIAA, 1987:2393.
8 ISAACS R. Differential games: A mathematical theory with applications to warfare and pursuit, control and optimization[M]. Mineola: Dover Publications, 1999.
9 HUANG C Q, DONG K S, HUANG H Q, et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization[J]. Journal of Systems Engineering and Electronics201829(1): 86-97.
10 BURGIN G H, OWENS A J. An adaptive maneuvering logic computer program for the simulation of one-to-one air-to-air combat. Volume 2: Program description:NASA-CR-2583 [R]. Washington, D. C.:NASA, 1975.
11 SUN Z X, PIAO H Y, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence202198: 104112.
12 MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature2015518: 529-533.
13 SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature2016529: 484-489.
14 BERNER C, BROCKMAN G, CHAN B, et al. Dota2 with large scale deep reinforcement learning[DB/OL]. arXiv preprint1912.06680,2019.
15 章胜, 周攀, 何扬, 等. 基于深度强化学习的空战机动决策试验[J]. 航空学报202344(10): 128094.
  ZHANG S, ZHOU P, HE Y, et al. Air combat maneuver decision-making test based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(10): 128094 (in Chinese).
16 张建东, 王鼎涵, 杨啟明, 等. 基于分层强化学习的无人机空战多维决策[J]. 兵工学报202344(6): 1547-1563.
  ZHANG J D, WANG D H, YANG Q M, et al. Multi-dimensional decision-making for UAV air combat based on hierarchical reinforcement learning[J]. Acta Armamentarii202344(6): 1547-1563 (in Chinese).
17 邱妍, 赵宝奇, 邹杰, 等. 基于PPO算法的无人机近距空战自主引导方法[J]. 电光与控制202330(1): 8-14.
  QIU Y, ZHAO B Q, ZOU J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm[J]. Electronics Optics & Control202330(1): 8-14 (in Chinese).
18 钱殿伟, 齐红敏, 刘振, 等. 基于改进近端策略优化的空战自主决策研究[J/OL]. 系统仿真学报,(2023-07-20)[2024-01-01]. .
  QIAN D W, QI H M, LIU Z, et al. Research on autonomous decision-making in air-combat based on improved proximal policy optimization[J/OL]. Journal of System Simulation,(2023-07-20)[2024-01-01]. (in Chinese).
19 BARTO A G. Reinforcement learning[M]∥OMIDVAR O, ELLIOTT D L. Neural Systems for Control. Amsterdam: Elsevier, 1997: 7-30.
20 SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥ Proceedings of the 12th International Conference on Neural Information Processing Systems. New York: ACM, 1999: 1057–1063.
21 SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]∥Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. New York:ACM,2015:1889-1897.
22 SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint:1707.06347,2017.
23 HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[DB/OL]. arXiv preprint1812.05905,2018.
24 LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint :1509.02971, 2015.
25 FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]∥ Proceedings of the 35th International Conference on Machine Learning,2018: 1587-1596.
26 SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint:1506.02438, 2015.
27 ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: A case study on PPO and TRPO[DB/OL]. arXiv preprint2005.12729, 2020.
28 ZHU J Y, KUANG M C, ZHOU W Q, et al. Mastering air combat game with deep reinforcement learning[J]. Defence Technology202434: 295-312.
29 王宝来,高显忠,谢涛,等.基于强化学习与种群博弈的近距空战决策研究[J/OL].航空学报, (2023-11-02)[2024-01-01]. .
  WANG B L, GAO X Z, XIE T, et al. Research on decision-making in close-range air combat based on reinforcement learning and population game[J/OL]. Acta Aeronautica et Astronautica Sinica,(2023-11-02)[2024-01-01]. (in Chinese).
30 张婷玉, 孙明玮, 王永帅, 等. 基于深度Q网络的近距空战智能机动决策研究[J]. 航空兵器202330(3): 41-48.
  ZHANG T Y, SUN M W, WANG Y S, et al. Research on intelligent maneuvering decision-making in close air combat based on deep Q network[J]. Aero Weaponry202330(3): 41-48 (in Chinese).
31 ZHANG H P, WEI Y J, ZHOU H, et al. Maneuver decision-making for autonomous air combat based on FRE-PPO[J]. Applied Sciences202212(20): 10230.
32 杨晟琦, 田明俊, 司迎利, 等. 基于分层强化学习的无人机机动决策[J]. 火力与指挥控制202348(8): 48-52, 59.
  YANG S Q, TIAN M J, SI Y L, et al. Research on UAV maneuver decision-making based on hierarchical reinforcement learning[J]. Fire Control & Command Control202348(8): 48-52, 59 (in Chinese).
33 钟友武, 柳嘉润, 杨凌宇, 等. 自主近距空战中机动动作库及其综合控制系统[J]. 航空学报200829(S1): 114-121.
  ZHONG Y W, LIU J R, YANG L Y, et al. Maneuver library and integrated control system for autonomous close-in air combat [J]. Acta Aeronautica et Astronautica Sinica200829(S1): 114-121 (in Chinese).
34 NG A Y, HARADA D, RUSSELL S J. Policy invariance under reward transformations: theory and application to reward shaping[C]∥ Proceedings of the Sixteenth International Conference on Machine Learning. New York: ACM, 1999:278-287.
35 祝靖宇, 张宏立, 匡敏驰, 等.稀疏奖励下基于课程学习的无人机空战仿真[J].系统仿真学报202436(6):1452-1467.
  ZHU J Y, ZHANG H L, KUANG M C, et al. Curriculum learning based simulation of UAV air combat under sparse rewards[J]. Journal of System Simulation202436(6):1452-1467 (in Chinese).
36 周文卿, 朱纪洪, 匡敏驰. 一种基于群体智能的无人空战系统[J]. 中国科学: 信息科学202050(3): 363-374.
  ZHOU W Q, ZHU J H, KUANG M C. An unmanned air combat system based on swarm intelligence[J]. Scientia Sinica (Informationis)202050(3): 363-374 (in Chinese).
37 FAN Z, SU R, ZHANG W N, et al. Hybrid actor-critic reinforcement learning in parameterized action space[DB/OL]. arXiv preprint1903.01344,2019.
Outlines

/