电子电气工程与控制

基于自博弈深度强化学习的空战智能决策方法

  • 单圣哲 ,
  • 张伟伟
展开
  • 1.西北工业大学 航空学院,西安  710072
    2.中国人民解放军93995部队,西安  710306

收稿日期: 2023-03-21

  修回日期: 2023-06-12

  录用日期: 2023-08-29

  网络出版日期: 2023-09-01

基金资助

国防科技重点实验室基金(6142219190302)

Air combat intelligent decision-making method based on self-play and deep reinforcement learning

  • Shengzhe SHAN ,
  • Weiwei ZHANG
Expand
  • 1.School of Aeronautics,Northwestern Polytechnical University,Xi’an  710072,China
    2.93995 Unit of the Chinese People’s Liberation Army,Xi’an  710306,China

Received date: 2023-03-21

  Revised date: 2023-06-12

  Accepted date: 2023-08-29

  Online published: 2023-09-01

Supported by

Science and Technology Foundation of National Defense Key Laboratory(6142219190302)

摘要

空战是战争走向立体的重要环节,智能空战已经成为国内外军事领域的研究热点和重点,深度强化学习是实现空战智能化的重要技术途径。针对单智能体训练方法难以构建高水平空战对手问题,提出基于自博弈的空战智能体训练方法,搭建研究平台,根据飞行员领域知识合理设计观测、动作与奖励,通过“左右互搏”方式训练空战智能体至收敛,并通过仿真试验验证空战决策模型的有效性。研究结果表明通过自博弈训练,空战智能体战术水平逐步提升,最终对单智能体训练的决策模型构成70%以上胜率,并涌现类似人类“单/双环”战术的空战策略。

本文引用格式

单圣哲 , 张伟伟 . 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024 , 45(4) : 328723 -328723 . DOI: 10.7527/S1000-6893.2023.28723

Abstract

Air combat is an important element in the three-dimensional nature of war, and intelligent air combat has become a hotspot and focus of research in the military field both domestically and internationally. Deep reinforcement learning is an important technological approach to achieving air combat intelligence. To address the challenge of constructing high-level opponents in single agent training method, a self-play based air combat agent training method is proposed, and a visualization research platform is built to develop a decision-making agent for close-range air combat. The field knowledge of pilots is embedded in the design process of the agent’s observation, action, and reward, training the agent to convergence. Simulation experiments show that the air combat tactics of agent gradually improves by self-play training, achieving a win rate of over 70% against the decision making by single agent training and the emerging of the strategies similar to human “single/double loop” tactics.

参考文献

1 杨伟. 关于未来战斗机发展的若干讨论[J]. 航空学报202041(6): 524377.
  YANG W. Development of future fighters[J]. Acta Aeronautica et Astronautica Sinica202041(6): 524377 (in Chinese).
2 Defense Advanced Research Projects Agency. Alpha dog fight trials go virtual for final event[EB/OL]. (2020-08-07) [2021-03-10]. :.
3 董一群, 艾剑良. 自主空战技术中的机动决策:进展与展望[J]. 航空学报202041(S2): 724264.
  DONG Y Q, AI J L. Decision making in autonomous air combat: review and prospects[J]. Acta Aeronautica et Astronautica Sinica202041(S2): 724264 (in Chinese).
4 SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature2016529(7587): 484-489.
5 SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature2017550(7676): 354-359.
6 SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science2018362(6419): 1140-1144.
7 JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature2021596(7873): 583-589.
8 FAWZI A, BALOG M, HUANG A, et al. Discovering faster matrix multiplication algorithms with reinforcement learning[J]. Nature2022610(7930): 47-53.
9 SILVER D, SINGH S, PRECUP D, et al. Reward is enough[J]. Artificial Intelligence2021299: 103535.
10 VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature2019575(7782): 350-354.
11 VINYALS O, EWALDS T, BARTUNOV S, et al. StarCraft II: A new challenge for reinforcement learning[DB/OL]. 2017:arXiv preprint:1708.04782.
12 OpenAI. OpenAI five[EB/OL]. 2018. .
13 BAKER B, KANITSCHEIDER I, MARKOV T M, et al. Emergent tool use from multi-agent autocurricula[DB/OL]. arXiv preprint1909.07528, 2020.
14 OH I, RHO S, MOON S, et al. Creating pro-level AI for a real-time fighting game using deep reinforcement learning[J]. IEEE Transactions on Games202214(2): 212-220.
15 KURNIAWAN B, VAMPLEW P, PAPASIMEON M, et al. An empirical study of reward structures for actor-critic reinforcement learning in air combat manoeuvring simulation[C]∥ Australasian Joint Conference on Artificial Intelligence. Cham: Springer, 2019: 54-65.
16 YANG Q M, ZHU Y, ZHANG J D, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm[C]∥ 2019 IEEE 15th International Conference on Control and Automation (ICCA). Piscataway: IEEE Press, 2019: 37-42.
17 YANG Q M, ZHANG J D, SHI G Q, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access20198: 363-378.
18 PIAO H Y, SUN Z X, MENG G L, et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning[C]∥ 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2020: 1-8.
19 单圣哲, 杨孟超, 张伟伟, 等. 自主空战连续决策方法[J]. 航空工程进展202213(5): 47-58.
  SHAN S Z, YANG M C, ZHANG W W, et al. Continuous decision-making method for autonomous air combat[J]. Advances in Aeronautical Science and Engineering202213(5): 47-58 (in Chinese).
20 SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. 2nd Ed.Cambridge: MIT Press, 2018.
21 MATHEW A, AMUDHA P, SIVAKUMARI S. Deep learning techniques: an overview[C]∥International Conference on Advanced Machine Learning Technologies and Applications. Singapore: Springer, 2021: 599-608.
22 MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[DB/OL]. arXiv preprint: 1312.5602, 2013.
23 Github. Unity technologies[EB/OL].(2022-12-14). .
24 SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. arXiv preprint: 1707.06347, 2017.
25 VON NEUMANN J, MORGENSTERN O. Theory of games and economic behavior: 60th anniversary commemorative edition[M]. Princeton: Princeton University Press, 2007.
26 SHAPLEY L S. Stochastic games[J]. Proceedings of the National Academy of Sciences of the United States of America195339(10): 1095-1100.
27 LITTMAN M L. Markov games as a framework for multi-agent reinforcement learning[M]∥KAUFMANN M. Machine learning proceedings. Amsterdam: Elsevier, 1994: 157-163.
28 BROWN G W. Iterative solution of games by fictitious play[J]. Activity Analysis of Production and Allocation195113(1): 374-376.
29 SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model[J]. Nature2020588(7839): 604-609.
30 ZHA D C, XIE J R, MA W Y, et al. DouZero: Mastering DouDizhu with self-play deep reinforcement learning[DB/OL]. arXiv preprint: 2106.06135, 2021.
31 BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent complexity via multi-agent competition[DB/OL]. arXiv preprint: 1710.03748, 2017.
32 JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science2019364(6443): 859-865.
33 JULIANI A, BERGES V P, VCKAY E, et al. Unity: a general platform for intelligent agentsV[DB/OL]. arXiv preprint1809.02627, 2020.
34 BONANNI P. The art of the kill: A comprehensive guide to modern air combat[M]. Boulder: Spectrum HoloByte, 1993.
35 吴文海, 周思羽, 高丽, 等. 基于导弹攻击区的超视距空战态势评估改进[J]. 系统工程与电子技术201133(12): 2679-2685.
  WU W H, ZHOU S Y, GAO L, et al. Improvements of situation assessment for beyond-visual-range air combat based on missile launching envelope analysis[J]. Systems Engineering and Electronics201133(12): 2679-2685 (in Chinese).
36 YANG Y D, WANG J. An overview of multi-agent reinforcement learning from game theoretical perspective[DB/OL]. arXiv preprint: 2011.00583v3, 2021.
37 SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint: 1506.02438, 2015.
38 Technologies Unity. Unity ML-agents toolkit[EB/OL]. (2023-07-10).
39 JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science2019364(6443): 859-865.
40 Wikipedia. Elo rating system[EB/OL]. 2021. .
41 Github.NWPU-SSZ[EB/OL].(2023-08-28). .
文章导航

/