ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Decision⁃making in close⁃range air combat based on reinforcement learning and population game
Received date: 2023-08-15
Revised date: 2023-09-06
Accepted date: 2023-10-31
Online published: 2023-11-01
Supported by
National Natural Science Foundation of China(61903369);Natural Science Foundation of Hunan Province(2018JJ3587)
With the development of artificial intelligence and Unmanned Aerial Vehicle (UAV) technologies, intelligent decision-making in close-range air combat has attracted extensive attention from all over the world. To solve the problems of overfitting and strategy cycles in using traditional reinforcement learning for intelligent decision-making in close-range air combat, a training paradigm of air combat decision model is proposed based on population game. By constructing a population composed of multiple UAV agents and assigning different reward weight coefficients to each agent, the diversified risk preference of UAV agents is realized. The problem of overfitting and strategy cycle can be avoided effectively by training agents of different risk preferences to fight against each other. During the training process, each UAV agent in the population adaptively optimizes the reward weight coefficient according to the results of the confrontation with different opponent strategies. In the numerical simulation experiment, Agent 5 and Agent 3 in population game training beat the intelligent decision model obtained by expert system adversarial training and self-play training with 88% and 85% success rate, respectively, which verifies the effectiveness of the algorithm. In addition, further experiments demonstrate the necessity of dynamic adjustment of weight coefficients in the training paradigm of population game, and verify the generality of the proposed training paradigm on heterogeneous models.
Baolai WANG , Xianzhong GAO , Tao XIE , Zhongxi HOU . Decision⁃making in close⁃range air combat based on reinforcement learning and population game[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(12) : 329446 -329446 . DOI: 10.7527/S1000-6893.2023.29446
1 | 孙聪. 从空战制胜机理演变看未来战斗机发展趋势[J]. 航空学报, 2021, 42(8): 525826. |
SUN C. Development trend of future fighter: A review of evolution of winning mechanism in air combat[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525826 (in Chinese). | |
2 | 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报, 2021, 42(8): 525799. |
SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525799 (in Chinese). | |
3 | 樊会涛, 闫俊. 空战体系的演变及发展趋势[J]. 航空学报, 2022, 43(10): 527397. |
FAN H T, YAN J. Evolution and development trend of air combat system[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(10): 527397 (in Chinese). | |
4 | BURGIN G H, FOGEL L J, PHELPS J P. An adaptive maneuvering logic computer program for the simulation of one-on-one air-to-air combat. Volume 1: General description: NASA-CR-2582[R]. Washington, D.C.: NASA, 1975. |
5 | 傅莉, 谢福怀, 孟光磊, 等. 基于滚动时域的无人机空战决策专家系统[J]. 北京航空航天大学学报, 2015, 41(11): 1994-1999. |
FU L, XIE F H, MENG G L, et al. An UAV air-combat decision expert system based on receding horizon control[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(11): 1994-1999 (in Chinese). | |
6 | SMITH R. Classifier systems in combat: Two-sided learning of maneuvers for advanced fighter aircraft[J]. Computer Methods in Applied Mechanics and Engineering, 2000, 186(2-4): 421-437. |
7 | 周文卿, 朱纪洪, 匡敏驰. 一种基于群体智能的无人空战系统[J]. 中国科学: 信息科学, 2020, 50(3): 363-374. |
ZHOU W Q, ZHU J H, KUANG M C. An unmanned air combat system based on swarm intelligence[J]. Scientia Sinica (Informationis), 2020, 50(3): 363-374 (in Chinese). | |
8 | ERNEST N, CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J/OL]. Journal of Defense Management (2016-06-23) [2023-07-18]. . |
9 | AUSTIN F, CARBONE G, HINZ H, et al. Game theory for automated maneuvering during air-to-air combat[J]. Journal of Guidance Control Dynamics, 1990, 13(6): 1143-1149. |
10 | WEINTRAUB I E, PACHTER M, GARCIA E. An introduction to pursuit-evasion differential games[C]∥ 2020 American Control Conference (ACC). Piscataway: IEEE Press, 2020: 1049-1066. |
11 | YANG J C, ZHANG J P, WANG H H. Urban traffic control in software defined Internet of Things via a multi-agent deep reinforcement learning approach[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(6): 3742-3754. |
12 | FU M S, HUANG L W, RAO A, et al. A deep reinforcement learning recommender system with multiple policies for recommendations[J]. IEEE Transactions on Industrial Informatics, 2023, 19(2): 2049-2061. |
13 | POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥ 2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284. |
14 | 李波, 白双霞, 孟波波, 等. 基于SAC算法的无人机自主空战决策算法[J]. 指挥控制与仿真, 2022, 44(5): 24-30. |
LI B, BAI S X, MENG B B, et al. Autonomous Air Combat Decision-making Algorithm of UAVs Based on SAC algorithm[J]. Command Control & Simulation, 2022, 44(5): 24-30 (in Chinese). | |
15 | 邱妍, 赵宝奇, 邹杰, 等. 基于PPO算法的无人机近距空战自主引导方法[J]. 电光与控制, 2023, 30(1): 8-14. |
QIU Y, ZHAO B Q, ZOU J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm[J]. Electronics Optics & Control, 2023, 30(1): 8-14 (in Chinese). | |
16 | 丁维, 王渊, 丁达理, 等. 基于LSTM-PPO算法的无人作战飞机近距空战机动决策[J]. 空军工程大学学报(自然科学版), 2022, 23(3): 19-25. |
DING W, WANG Y, DING D L, et al. Maneuvering decision of UCAV in close air combat based on LSTM-PPO algorithm[J]. Journal of Air Force Engineering University (Natural Science Edition), 2022, 23(3): 19-25 (in Chinese). | |
17 | 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731. |
ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(4): 126731 (in Chinese). | |
18 | 章胜, 周攀, 何扬, 等. 基于深度强化学习的空战机动决策试验[J]. 航空学报, 2023, 44(10): 128094. |
ZHANG S, ZHOU P, HE Y, et al. Air combat maneuver decision-making test based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(10): 128094 (in Chinese). | |
19 | 付宇鹏, 邓向阳, 朱子强, 等. 基于价值滤波的空战机动决策优化方法[J]. 航空学报, 2023, 44(22): 628871. |
FU Y P, DENG X Y, ZHU Z Q, et al. Value-filter based air-combat maneuvering optimization[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 628871 (in Chinese). | |
20 | PIAO H Y, SUN Z X, MENG G L, et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning[C]∥ 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2020: 1-8. |
21 | SUN Z X, PIAO H Y, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence, 2021, 98: 104112. |
22 | CZARNECKI W M, GIDEL G, TRACEY B, et al. Real world games look like spinning tops[C]∥ Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 17443–17454. |
23 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint: 1509.02971, 2015. |
24 | FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. arXiv preprint: 1802.09477, 2018. |
25 | HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[DB/OL]. arXiv preprint: 1812.05905, 2018. |
26 | VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature, 2019, 575: 350-354. |
27 | JADERBERG M, DALIBARD V, OSINDERO S, et al. Population based training of neural networks[DB/OL]. arXiv preprint: 1711.09846, 2017. |
28 | BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent complexity via multi-agent competition[DB/OL]. arXiv preprint: 1710.03748, 2017. |
/
〈 |
|
〉 |