Electronics and Electrical Engineering and Control

Decision⁃making in close⁃range air combat based on reinforcement learning and population game

  • Baolai WANG ,
  • Xianzhong GAO ,
  • Tao XIE ,
  • Zhongxi HOU
Expand
  • 1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China
    2.College of Aerospace Science and Engineering,National University of Defense Technology,Changsha 410073,China

Received date: 2023-08-15

  Revised date: 2023-09-06

  Accepted date: 2023-10-31

  Online published: 2023-11-01

Supported by

National Natural Science Foundation of China(61903369);Natural Science Foundation of Hunan Province(2018JJ3587)

Abstract

With the development of artificial intelligence and Unmanned Aerial Vehicle (UAV) technologies, intelligent decision-making in close-range air combat has attracted extensive attention from all over the world. To solve the problems of overfitting and strategy cycles in using traditional reinforcement learning for intelligent decision-making in close-range air combat, a training paradigm of air combat decision model is proposed based on population game. By constructing a population composed of multiple UAV agents and assigning different reward weight coefficients to each agent, the diversified risk preference of UAV agents is realized. The problem of overfitting and strategy cycle can be avoided effectively by training agents of different risk preferences to fight against each other. During the training process, each UAV agent in the population adaptively optimizes the reward weight coefficient according to the results of the confrontation with different opponent strategies. In the numerical simulation experiment, Agent 5 and Agent 3 in population game training beat the intelligent decision model obtained by expert system adversarial training and self-play training with 88% and 85% success rate, respectively, which verifies the effectiveness of the algorithm. In addition, further experiments demonstrate the necessity of dynamic adjustment of weight coefficients in the training paradigm of population game, and verify the generality of the proposed training paradigm on heterogeneous models.

Cite this article

Baolai WANG , Xianzhong GAO , Tao XIE , Zhongxi HOU . Decision⁃making in close⁃range air combat based on reinforcement learning and population game[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(12) : 329446 -329446 . DOI: 10.7527/S1000-6893.2023.29446

References

1 孙聪. 从空战制胜机理演变看未来战斗机发展趋势[J]. 航空学报202142(8): 525826.
  SUN C. Development trend of future fighter: A review of evolution of winning mechanism in air combat[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525826 (in Chinese).
2 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报202142(8): 525799.
  SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525799 (in Chinese).
3 樊会涛, 闫俊. 空战体系的演变及发展趋势[J]. 航空学报202243(10): 527397.
  FAN H T, YAN J. Evolution and development trend of air combat system[J]. Acta Aeronautica et Astronautica Sinica202243(10): 527397 (in Chinese).
4 BURGIN G H, FOGEL L J, PHELPS J P. An adaptive maneuvering logic computer program for the simulation of one-on-one air-to-air combat. Volume 1: General description: NASA-CR-2582[R]. Washington, D.C.: NASA, 1975.
5 傅莉, 谢福怀, 孟光磊, 等. 基于滚动时域的无人机空战决策专家系统[J]. 北京航空航天大学学报201541(11): 1994-1999.
  FU L, XIE F H, MENG G L, et al. An UAV air-combat decision expert system based on receding horizon control[J]. Journal of Beijing University of Aeronautics and Astronautics201541(11): 1994-1999 (in Chinese).
6 SMITH R. Classifier systems in combat: Two-sided learning of maneuvers for advanced fighter aircraft[J]. Computer Methods in Applied Mechanics and Engineering2000186(2-4): 421-437.
7 周文卿, 朱纪洪, 匡敏驰. 一种基于群体智能的无人空战系统[J]. 中国科学: 信息科学202050(3): 363-374.
  ZHOU W Q, ZHU J H, KUANG M C. An unmanned air combat system based on swarm intelligence[J]. Scientia Sinica (Informationis)202050(3): 363-374 (in Chinese).
8 ERNEST N, CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J/OL]. Journal of Defense Management (2016-06-23) [2023-07-18]. .
9 AUSTIN F, CARBONE G, HINZ H, et al. Game theory for automated maneuvering during air-to-air combat[J]. Journal of Guidance Control Dynamics199013(6): 1143-1149.
10 WEINTRAUB I E, PACHTER M, GARCIA E. An introduction to pursuit-evasion differential games[C]∥ 2020 American Control Conference (ACC). Piscataway: IEEE Press, 2020: 1049-1066.
11 YANG J C, ZHANG J P, WANG H H. Urban traffic control in software defined Internet of Things via a multi-agent deep reinforcement learning approach[J]. IEEE Transactions on Intelligent Transportation Systems202122(6): 3742-3754.
12 FU M S, HUANG L W, RAO A, et al. A deep reinforcement learning recommender system with multiple policies for recommendations[J]. IEEE Transactions on Industrial Informatics202319(2): 2049-2061.
13 POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥ 2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284.
14 李波, 白双霞, 孟波波, 等. 基于SAC算法的无人机自主空战决策算法[J]. 指挥控制与仿真202244(5): 24-30.
  LI B, BAI S X, MENG B B, et al. Autonomous Air Combat Decision-making Algorithm of UAVs Based on SAC algorithm[J]. Command Control & Simulation202244(5): 24-30 (in Chinese).
15 邱妍, 赵宝奇, 邹杰, 等. 基于PPO算法的无人机近距空战自主引导方法[J]. 电光与控制202330(1): 8-14.
  QIU Y, ZHAO B Q, ZOU J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm[J]. Electronics Optics & Control202330(1): 8-14 (in Chinese).
16 丁维, 王渊, 丁达理, 等. 基于LSTM-PPO算法的无人作战飞机近距空战机动决策[J]. 空军工程大学学报(自然科学版)202223(3): 19-25.
  DING W, WANG Y, DING D L, et al. Maneuvering decision of UCAV in close air combat based on LSTM-PPO algorithm[J]. Journal of Air Force Engineering University (Natural Science Edition)202223(3): 19-25 (in Chinese).
17 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报202344(4): 126731.
  ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(4): 126731 (in Chinese).
18 章胜, 周攀, 何扬, 等. 基于深度强化学习的空战机动决策试验[J]. 航空学报202344(10): 128094.
  ZHANG S, ZHOU P, HE Y, et al. Air combat maneuver decision-making test based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(10): 128094 (in Chinese).
19 付宇鹏, 邓向阳, 朱子强, 等. 基于价值滤波的空战机动决策优化方法[J]. 航空学报202344(22): 628871.
  FU Y P, DENG X Y, ZHU Z Q, et al. Value-filter based air-combat maneuvering optimization[J]. Acta Aeronautica et Astronautica Sinica202344(22): 628871 (in Chinese).
20 PIAO H Y, SUN Z X, MENG G L, et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning[C]∥ 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2020: 1-8.
21 SUN Z X, PIAO H Y, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence202198: 104112.
22 CZARNECKI W M, GIDEL G, TRACEY B, et al. Real world games look like spinning tops[C]∥ Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 17443–17454.
23 LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint: 1509.02971, 2015.
24 FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. arXiv preprint1802.09477, 2018.
25 HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[DB/OL]. arXiv preprint1812.05905, 2018.
26 VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature2019575: 350-354.
27 JADERBERG M, DALIBARD V, OSINDERO S, et al. Population based training of neural networks[DB/OL]. arXiv preprint: 1711.09846, 2017.
28 BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent complexity via multi-agent competition[DB/OL]. arXiv preprint: 1710.03748, 2017.
Outlines

/