ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Air combat intelligent decision-making method based on self-play and deep reinforcement learning
Received date: 2023-03-21
Revised date: 2023-06-12
Accepted date: 2023-08-29
Online published: 2023-09-01
Supported by
Science and Technology Foundation of National Defense Key Laboratory(6142219190302)
Air combat is an important element in the three-dimensional nature of war, and intelligent air combat has become a hotspot and focus of research in the military field both domestically and internationally. Deep reinforcement learning is an important technological approach to achieving air combat intelligence. To address the challenge of constructing high-level opponents in single agent training method, a self-play based air combat agent training method is proposed, and a visualization research platform is built to develop a decision-making agent for close-range air combat. The field knowledge of pilots is embedded in the design process of the agent’s observation, action, and reward, training the agent to convergence. Simulation experiments show that the air combat tactics of agent gradually improves by self-play training, achieving a win rate of over 70% against the decision making by single agent training and the emerging of the strategies similar to human “single/double loop” tactics.
Key words: air combat; artificial intelligence; deep reinforcement learning; self-play; agent
Shengzhe SHAN , Weiwei ZHANG . Air combat intelligent decision-making method based on self-play and deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(4) : 328723 -328723 . DOI: 10.7527/S1000-6893.2023.28723
1 | 杨伟. 关于未来战斗机发展的若干讨论[J]. 航空学报, 2020, 41(6): 524377. |
YANG W. Development of future fighters[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(6): 524377 (in Chinese). | |
2 | Defense Advanced Research Projects Agency. Alpha dog fight trials go virtual for final event[EB/OL]. (2020-08-07) [2021-03-10]. :. |
3 | 董一群, 艾剑良. 自主空战技术中的机动决策:进展与展望[J]. 航空学报, 2020, 41(S2): 724264. |
DONG Y Q, AI J L. Decision making in autonomous air combat: review and prospects[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S2): 724264 (in Chinese). | |
4 | SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489. |
5 | SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of go without human knowledge[J]. Nature, 2017, 550(7676): 354-359. |
6 | SILVER D, HUBERT T, SCHRITTWIESER J, et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play[J]. Science, 2018, 362(6419): 1140-1144. |
7 | JUMPER J, EVANS R, PRITZEL A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
8 | FAWZI A, BALOG M, HUANG A, et al. Discovering faster matrix multiplication algorithms with reinforcement learning[J]. Nature, 2022, 610(7930): 47-53. |
9 | SILVER D, SINGH S, PRECUP D, et al. Reward is enough[J]. Artificial Intelligence, 2021, 299: 103535. |
10 | VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575(7782): 350-354. |
11 | VINYALS O, EWALDS T, BARTUNOV S, et al. StarCraft II: A new challenge for reinforcement learning[DB/OL]. 2017:arXiv preprint:1708.04782. |
12 | OpenAI. OpenAI five[EB/OL]. 2018. . |
13 | BAKER B, KANITSCHEIDER I, MARKOV T M, et al. Emergent tool use from multi-agent autocurricula[DB/OL]. arXiv preprint:1909.07528, 2020. |
14 | OH I, RHO S, MOON S, et al. Creating pro-level AI for a real-time fighting game using deep reinforcement learning[J]. IEEE Transactions on Games, 2022, 14(2): 212-220. |
15 | KURNIAWAN B, VAMPLEW P, PAPASIMEON M, et al. An empirical study of reward structures for actor-critic reinforcement learning in air combat manoeuvring simulation[C]∥ Australasian Joint Conference on Artificial Intelligence. Cham: Springer, 2019: 54-65. |
16 | YANG Q M, ZHU Y, ZHANG J D, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm[C]∥ 2019 IEEE 15th International Conference on Control and Automation (ICCA). Piscataway: IEEE Press, 2019: 37-42. |
17 | YANG Q M, ZHANG J D, SHI G Q, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access, 2019, 8: 363-378. |
18 | PIAO H Y, SUN Z X, MENG G L, et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning[C]∥ 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2020: 1-8. |
19 | 单圣哲, 杨孟超, 张伟伟, 等. 自主空战连续决策方法[J]. 航空工程进展, 2022, 13(5): 47-58. |
SHAN S Z, YANG M C, ZHANG W W, et al. Continuous decision-making method for autonomous air combat[J]. Advances in Aeronautical Science and Engineering, 2022, 13(5): 47-58 (in Chinese). | |
20 | SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. 2nd Ed.Cambridge: MIT Press, 2018. |
21 | MATHEW A, AMUDHA P, SIVAKUMARI S. Deep learning techniques: an overview[C]∥International Conference on Advanced Machine Learning Technologies and Applications. Singapore: Springer, 2021: 599-608. |
22 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[DB/OL]. arXiv preprint: 1312.5602, 2013. |
23 | Github. Unity technologies[EB/OL].(2022-12-14). . |
24 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [DB/OL]. arXiv preprint: 1707.06347, 2017. |
25 | VON NEUMANN J, MORGENSTERN O. Theory of games and economic behavior: 60th anniversary commemorative edition[M]. Princeton: Princeton University Press, 2007. |
26 | SHAPLEY L S. Stochastic games[J]. Proceedings of the National Academy of Sciences of the United States of America, 1953, 39(10): 1095-1100. |
27 | LITTMAN M L. Markov games as a framework for multi-agent reinforcement learning[M]∥KAUFMANN M. Machine learning proceedings. Amsterdam: Elsevier, 1994: 157-163. |
28 | BROWN G W. Iterative solution of games by fictitious play[J]. Activity Analysis of Production and Allocation, 1951, 13(1): 374-376. |
29 | SCHRITTWIESER J, ANTONOGLOU I, HUBERT T, et al. Mastering Atari, Go, chess and shogi by planning with a learned model[J]. Nature, 2020, 588(7839): 604-609. |
30 | ZHA D C, XIE J R, MA W Y, et al. DouZero: Mastering DouDizhu with self-play deep reinforcement learning[DB/OL]. arXiv preprint: 2106.06135, 2021. |
31 | BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent complexity via multi-agent competition[DB/OL]. arXiv preprint: 1710.03748, 2017. |
32 | JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859-865. |
33 | JULIANI A, BERGES V P, VCKAY E, et al. Unity: a general platform for intelligent agentsV[DB/OL]. arXiv preprint: 1809.02627, 2020. |
34 | BONANNI P. The art of the kill: A comprehensive guide to modern air combat[M]. Boulder: Spectrum HoloByte, 1993. |
35 | 吴文海, 周思羽, 高丽, 等. 基于导弹攻击区的超视距空战态势评估改进[J]. 系统工程与电子技术, 2011, 33(12): 2679-2685. |
WU W H, ZHOU S Y, GAO L, et al. Improvements of situation assessment for beyond-visual-range air combat based on missile launching envelope analysis[J]. Systems Engineering and Electronics, 2011, 33(12): 2679-2685 (in Chinese). | |
36 | YANG Y D, WANG J. An overview of multi-agent reinforcement learning from game theoretical perspective[DB/OL]. arXiv preprint: 2011.00583v3, 2021. |
37 | SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint: 1506.02438, 2015. |
38 | Technologies Unity. Unity ML-agents toolkit[EB/OL]. (2023-07-10). |
39 | JADERBERG M, CZARNECKI W M, DUNNING I, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning[J]. Science, 2019, 364(6443): 859-865. |
40 | Wikipedia. Elo rating system[EB/OL]. 2021. . |
41 | Github.NWPU-SSZ[EB/OL].(2023-08-28). . |
/
〈 |
|
〉 |