论文

基于混合动作的空战分层强化学习决策算法

  • 李佐龙 ,
  • 朱纪洪 ,
  • 匡敏驰 ,
  • 张杰 ,
  • 任洁
展开
  • 1.清华大学 精密仪器系,北京 100084
    2.航空工业成都飞机设计研究所,成都 610091
.E-mail: jhzhu@tsinghua.edu.cn

收稿日期: 2024-01-02

  修回日期: 2024-01-11

  录用日期: 2024-04-22

  网络出版日期: 2024-04-25

Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning

  • Zuolong LI ,
  • Jihong ZHU ,
  • Minchi KUANG ,
  • Jie ZHANG ,
  • Jie REN
Expand
  • 1.Department of Precision Instrument,Tsinghua University,Beijing 100084,China
    2.AVIC Chengdu Flight Design and Research Institute,Chengdu 610091,China

Received date: 2024-01-02

  Revised date: 2024-01-11

  Accepted date: 2024-04-22

  Online published: 2024-04-25

摘要

智能空战是世界主要军事强国的研究热点。为解决超视距空战博弈机动决策问题,提出了基于深度强化学习的超视距空战分层决策算法。在该决策算法中,使用适合于超视距空战的机动动作集,对飞机的航迹和姿态进行控制。为了扩大模型的动作空间,提升模型的决策能力,将空战的动作空间进行分层,建模为多维离散的动作空间。针对空战中稀疏奖励的问题,设计了一套综合考虑位置优势、武器发射和武器威胁等要素的奖励函数,用于引导智能体向最优策略收敛。搭建了完整的数字孪生空战仿真环境和空战专家系统,在仿真环境中训练决策算法,并通过与专家系统的对抗,对决策算法进行评估。实验结果表明:决策算法具备超视距空战自主决策的能力,能够根据战场态势,进行灵活的机动决策,在与专家系统对抗的过程中具有一定的优势。

本文引用格式

李佐龙 , 朱纪洪 , 匡敏驰 , 张杰 , 任洁 . 基于混合动作的空战分层强化学习决策算法[J]. 航空学报, 2024 , 45(17) : 530053 -530053 . DOI: 10.7527/S1000-6893.2024.30053

Abstract

Intelligent air combat is a hot research topic among countries with strong military power in the world. To solve the maneuver decision problem of air combat Beyond Visual Range (BVR), we propose the hierarchical decision algorithm based on deep reinforcement learning. In the decision algorithm, we use the maneuver set appropriate to the BVR air combat to control the trajectory and the attitude of the aircraft. To expand the action space of the model and increase its decision-making ability, we hierarchize the action space and model it as the multi-discrete one. To solve the problem of sparse reward in air combat, we design a set of reward function taking into consideration the factors including the position advantage, weapon launching, and weapon threat, which can guide the agent to converge to the optimal policy. We also build a complete digital-twin simulation environment for air combat and an expert system. The decision algorithm is trained in the simulation environment, and is evaluated by fighting with the expert system. The experiment results indicate that the decision algorithm proposed has the ability to make autonomous and flexible decisions in BVR air combat based on current situations, and has some advantages against the expert system.

参考文献

1 喻煌超, 牛轶峰, 王祥科. 无人机系统发展阶段和智能化趋势[J]. 国防科技202142(3): 18-24.
  YU H C, NIU Y F, WANG X K. Stages of development of Unmanned Aerial Vehicles[J]. National Defense Technology202142(3): 18-24 (in Chinese).
2 ERNEST N, CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J]. Journal of Defense Management20166(1): 1000144.
3 POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥ 2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284.
4 DINARDO G. Artificial intelligence flies XQ-58A Valkyrie drone [EB/OL] (2023-08-03)[2023-12-15]. .
5 赵志忠, 高正红, 刘行伟, 等. 用攻击点推移速率评估一对一超视距空战效能[J]. 系统仿真学报200517(12): 2855-2857, 2862.
  ZHAO Z Z, GAO Z H, LIU X W, et al. Using shooting point stepping pace for evaluating one-versus-one BVR combat effectiveness[J]. Acta Simulata Systematica Sinica200517(12): 2855-2857, 2862 (in Chinese).
6 杜海文, 崔明朗, 韩统, 等. 基于多目标优化与强化学习的空战机动决策[J]. 北京航空航天大学学报201844(11): 2247-2256.
  DU H W, CUI M L, HAN T, et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics201844(11): 2247-2256 (in Chinese).
7 AUSTIN F, CARBONE G, FALCO M, et al. Automated maneuvering decisions for air-to-air combat[C]∥ Proceedings of the Guidance, Navigation and Control Conference. Reston: AIAA, 1987:2393.
8 ISAACS R. Differential games: A mathematical theory with applications to warfare and pursuit, control and optimization[M]. Mineola: Dover Publications, 1999.
9 HUANG C Q, DONG K S, HUANG H Q, et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization[J]. Journal of Systems Engineering and Electronics201829(1): 86-97.
10 BURGIN G H, OWENS A J. An adaptive maneuvering logic computer program for the simulation of one-to-one air-to-air combat. Volume 2: Program description:NASA-CR-2583 [R]. Washington, D. C.:NASA, 1975.
11 SUN Z X, PIAO H Y, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence202198: 104112.
12 MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature2015518: 529-533.
13 SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature2016529: 484-489.
14 BERNER C, BROCKMAN G, CHAN B, et al. Dota2 with large scale deep reinforcement learning[DB/OL]. arXiv preprint1912.06680,2019.
15 章胜, 周攀, 何扬, 等. 基于深度强化学习的空战机动决策试验[J]. 航空学报202344(10): 128094.
  ZHANG S, ZHOU P, HE Y, et al. Air combat maneuver decision-making test based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(10): 128094 (in Chinese).
16 张建东, 王鼎涵, 杨啟明, 等. 基于分层强化学习的无人机空战多维决策[J]. 兵工学报202344(6): 1547-1563.
  ZHANG J D, WANG D H, YANG Q M, et al. Multi-dimensional decision-making for UAV air combat based on hierarchical reinforcement learning[J]. Acta Armamentarii202344(6): 1547-1563 (in Chinese).
17 邱妍, 赵宝奇, 邹杰, 等. 基于PPO算法的无人机近距空战自主引导方法[J]. 电光与控制202330(1): 8-14.
  QIU Y, ZHAO B Q, ZOU J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm[J]. Electronics Optics & Control202330(1): 8-14 (in Chinese).
18 钱殿伟, 齐红敏, 刘振, 等. 基于改进近端策略优化的空战自主决策研究[J/OL]. 系统仿真学报,(2023-07-20)[2024-01-01]. .
  QIAN D W, QI H M, LIU Z, et al. Research on autonomous decision-making in air-combat based on improved proximal policy optimization[J/OL]. Journal of System Simulation,(2023-07-20)[2024-01-01]. (in Chinese).
19 BARTO A G. Reinforcement learning[M]∥OMIDVAR O, ELLIOTT D L. Neural Systems for Control. Amsterdam: Elsevier, 1997: 7-30.
20 SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥ Proceedings of the 12th International Conference on Neural Information Processing Systems. New York: ACM, 1999: 1057–1063.
21 SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]∥Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. New York:ACM,2015:1889-1897.
22 SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint:1707.06347,2017.
23 HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[DB/OL]. arXiv preprint1812.05905,2018.
24 LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint :1509.02971, 2015.
25 FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]∥ Proceedings of the 35th International Conference on Machine Learning,2018: 1587-1596.
26 SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint:1506.02438, 2015.
27 ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: A case study on PPO and TRPO[DB/OL]. arXiv preprint2005.12729, 2020.
28 ZHU J Y, KUANG M C, ZHOU W Q, et al. Mastering air combat game with deep reinforcement learning[J]. Defence Technology202434: 295-312.
29 王宝来,高显忠,谢涛,等.基于强化学习与种群博弈的近距空战决策研究[J/OL].航空学报, (2023-11-02)[2024-01-01]. .
  WANG B L, GAO X Z, XIE T, et al. Research on decision-making in close-range air combat based on reinforcement learning and population game[J/OL]. Acta Aeronautica et Astronautica Sinica,(2023-11-02)[2024-01-01]. (in Chinese).
30 张婷玉, 孙明玮, 王永帅, 等. 基于深度Q网络的近距空战智能机动决策研究[J]. 航空兵器202330(3): 41-48.
  ZHANG T Y, SUN M W, WANG Y S, et al. Research on intelligent maneuvering decision-making in close air combat based on deep Q network[J]. Aero Weaponry202330(3): 41-48 (in Chinese).
31 ZHANG H P, WEI Y J, ZHOU H, et al. Maneuver decision-making for autonomous air combat based on FRE-PPO[J]. Applied Sciences202212(20): 10230.
32 杨晟琦, 田明俊, 司迎利, 等. 基于分层强化学习的无人机机动决策[J]. 火力与指挥控制202348(8): 48-52, 59.
  YANG S Q, TIAN M J, SI Y L, et al. Research on UAV maneuver decision-making based on hierarchical reinforcement learning[J]. Fire Control & Command Control202348(8): 48-52, 59 (in Chinese).
33 钟友武, 柳嘉润, 杨凌宇, 等. 自主近距空战中机动动作库及其综合控制系统[J]. 航空学报200829(S1): 114-121.
  ZHONG Y W, LIU J R, YANG L Y, et al. Maneuver library and integrated control system for autonomous close-in air combat [J]. Acta Aeronautica et Astronautica Sinica200829(S1): 114-121 (in Chinese).
34 NG A Y, HARADA D, RUSSELL S J. Policy invariance under reward transformations: theory and application to reward shaping[C]∥ Proceedings of the Sixteenth International Conference on Machine Learning. New York: ACM, 1999:278-287.
35 祝靖宇, 张宏立, 匡敏驰, 等.稀疏奖励下基于课程学习的无人机空战仿真[J].系统仿真学报202436(6):1452-1467.
  ZHU J Y, ZHANG H L, KUANG M C, et al. Curriculum learning based simulation of UAV air combat under sparse rewards[J]. Journal of System Simulation202436(6):1452-1467 (in Chinese).
36 周文卿, 朱纪洪, 匡敏驰. 一种基于群体智能的无人空战系统[J]. 中国科学: 信息科学202050(3): 363-374.
  ZHOU W Q, ZHU J H, KUANG M C. An unmanned air combat system based on swarm intelligence[J]. Scientia Sinica (Informationis)202050(3): 363-374 (in Chinese).
37 FAN Z, SU R, ZHANG W N, et al. Hybrid actor-critic reinforcement learning in parameterized action space[DB/OL]. arXiv preprint1903.01344,2019.
文章导航

/