基于强化学习与种群博弈的近距空战决策

doi:10.7527/S1000-6893.2023.29446

Abstract

Abstract:

With the development of artificial intelligence and Unmanned Aerial Vehicle （UAV） technologies， intelligent decision-making in close-range air combat has attracted extensive attention from all over the world. To solve the problems of overfitting and strategy cycles in using traditional reinforcement learning for intelligent decision-making in close-range air combat， a training paradigm of air combat decision model is proposed based on population game. By constructing a population composed of multiple UAV agents and assigning different reward weight coefficients to each agent， the diversified risk preference of UAV agents is realized. The problem of overfitting and strategy cycle can be avoided effectively by training agents of different risk preferences to fight against each other. During the training process， each UAV agent in the population adaptively optimizes the reward weight coefficient according to the results of the confrontation with different opponent strategies. In the numerical simulation experiment， Agent 5 and Agent 3 in population game training beat the intelligent decision model obtained by expert system adversarial training and self-play training with 88% and 85% success rate， respectively， which verifies the effectiveness of the algorithm. In addition， further experiments demonstrate the necessity of dynamic adjustment of weight coefficients in the training paradigm of population game， and verify the generality of the proposed training paradigm on heterogeneous models.

Key words: close-range air combat, intelligent decision-making, reinforcement learning, population game, SAC algorithm

CLC Number:

V249.12

Baolai WANG, Xianzhong GAO, Tao XIE, Zhongxi HOU. Decision⁃making in close⁃range air combat based on reinforcement learning and population game[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(12): 329446.

Figures/Tables 16

Fig.1

Fig.2

Table 1

Hyperparameter setting

超参数	数值	超参数	数值
种群大小 $N$	8	折扣因子 $γ$	0.99
策略抽样数量 $K$	8	批大小	256
隐藏层数	3	经验缓存区大小	$1 × 106$
隐藏层大小	128	训练回合数	$1 × 104$
优化器	Adam	仿真间隔/ms	10
学习率	$3 × 10 - 4$	决策间隔/ms	20

Table 1

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Table 2

Fig.8

Fig.9

Fig.10

Fig.11

Fig.12

Fig.13

Fig.14

References 28

1	孙聪. 从空战制胜机理演变看未来战斗机发展趋势［J］. 航空学报， 2021， 42（8）： 525826.
	SUN C. Development trend of future fighter： A review of evolution of winning mechanism in air combat［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525826 （in Chinese）.
2	孙智孝，杨晟琦，朴海音，等. 未来智能空战发展综述［J］. 航空学报， 2021， 42（8）： 525799.
	SUN Z X， YANG S Q， PIAO H Y， et al. A survey of air combat artificial intelligence［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525799 （in Chinese）.
3	樊会涛，闫俊. 空战体系的演变及发展趋势［J］. 航空学报， 2022， 43（10）： 527397.
	FAN H T， YAN J. Evolution and development trend of air combat system［J］. Acta Aeronautica et Astronautica Sinica， 2022， 43（10）： 527397 （in Chinese）.
4	BURGIN G H， FOGEL L J， PHELPS J P. An adaptive maneuvering logic computer program for the simulation of one-on-one air-to-air combat. Volume 1： General description： NASA-CR-2582［R］. Washington， D.C.： NASA， 1975.
5	傅莉，谢福怀，孟光磊，等. 基于滚动时域的无人机空战决策专家系统［J］. 北京航空航天大学学报， 2015， 41（11）： 1994-1999.
	FU L， XIE F H， MENG G L， et al. An UAV air-combat decision expert system based on receding horizon control［J］. Journal of Beijing University of Aeronautics and Astronautics， 2015， 41（11）： 1994-1999 （in Chinese）.
6	SMITH R. Classifier systems in combat： Two-sided learning of maneuvers for advanced fighter aircraft［J］. Computer Methods in Applied Mechanics and Engineering， 2000， 186（2-4）： 421-437.
7	周文卿，朱纪洪，匡敏驰. 一种基于群体智能的无人空战系统［J］. 中国科学：信息科学， 2020， 50（3）： 363-374.
	ZHOU W Q， ZHU J H， KUANG M C. An unmanned air combat system based on swarm intelligence［J］. Scientia Sinica （Informationis）， 2020， 50（3）： 363-374 （in Chinese）.
8	ERNEST N， CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions［J/OL］. Journal of Defense Management （2016-06-23）［2023-07-18］. .
9	AUSTIN F， CARBONE G， HINZ H， et al. Game theory for automated maneuvering during air-to-air combat［J］. Journal of Guidance Control Dynamics， 1990， 13（6）： 1143-1149.
10	WEINTRAUB I E， PACHTER M， GARCIA E. An introduction to pursuit-evasion differential games［C］∥ 2020 American Control Conference （ACC）. Piscataway： IEEE Press， 2020： 1049-1066.
11	YANG J C， ZHANG J P， WANG H H. Urban traffic control in software defined Internet of Things via a multi-agent deep reinforcement learning approach［J］. IEEE Transactions on Intelligent Transportation Systems， 2021， 22（6）： 3742-3754.
12	FU M S， HUANG L W， RAO A， et al. A deep reinforcement learning recommender system with multiple policies for recommendations［J］. IEEE Transactions on Industrial Informatics， 2023， 19（2）： 2049-2061.
13	POPE A P， IDE J S， MIĆOVIĆ D， et al. Hierarchical reinforcement learning for air-to-air combat［C］∥ 2021 International Conference on Unmanned Aircraft Systems （ICUAS）. Piscataway： IEEE Press， 2021： 275-284.
14	李波，白双霞，孟波波，等. 基于SAC算法的无人机自主空战决策算法［J］. 指挥控制与仿真， 2022， 44（5）： 24-30.
	LI B， BAI S X， MENG B B， et al. Autonomous Air Combat Decision-making Algorithm of UAVs Based on SAC algorithm［J］. Command Control & Simulation， 2022， 44（5）： 24-30 （in Chinese）.
15	邱妍，赵宝奇，邹杰，等. 基于PPO算法的无人机近距空战自主引导方法［J］. 电光与控制， 2023， 30（1）： 8-14.
	QIU Y， ZHAO B Q， ZOU J， et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm［J］. Electronics Optics & Control， 2023， 30（1）： 8-14 （in Chinese）.
16	丁维，王渊，丁达理，等. 基于LSTM-PPO算法的无人作战飞机近距空战机动决策［J］. 空军工程大学学报（自然科学版）， 2022， 23（3）： 19-25.
	DING W， WANG Y， DING D L， et al. Maneuvering decision of UCAV in close air combat based on LSTM-PPO algorithm［J］. Journal of Air Force Engineering University （Natural Science Edition）， 2022， 23（3）： 19-25 （in Chinese）.
17	周攀，黄江涛，章胜，等. 基于深度强化学习的智能空战决策与仿真［J］. 航空学报， 2023， 44（4）： 126731.
	ZHOU P， HUANG J T， ZHANG S， et al. Intelligent air combat decision making and simulation based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（4）： 126731 （in Chinese）.
18	章胜，周攀，何扬，等. 基于深度强化学习的空战机动决策试验［J］. 航空学报， 2023， 44（10）： 128094.
	ZHANG S， ZHOU P， HE Y， et al. Air combat maneuver decision-making test based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（10）： 128094 （in Chinese）.
19	付宇鹏，邓向阳，朱子强，等. 基于价值滤波的空战机动决策优化方法［J］. 航空学报， 2023， 44（22）： 628871.
	FU Y P， DENG X Y， ZHU Z Q， et al. Value-filter based air-combat maneuvering optimization［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（22）： 628871 （in Chinese）.
20	PIAO H Y， SUN Z X， MENG G L， et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning［C］∥ 2020 International Joint Conference on Neural Networks （IJCNN）. Piscataway： IEEE Press， 2020： 1-8.
21	SUN Z X， PIAO H Y， YANG Z， et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play［J］. Engineering Applications of Artificial Intelligence， 2021， 98： 104112.
22	CZARNECKI W M， GIDEL G， TRACEY B， et al. Real world games look like spinning tops［C］∥ Proceedings of the 34th International Conference on Neural Information Processing Systems. New York： ACM， 2020： 17443–17454.
23	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［DB/OL］. arXiv preprint： 1509.02971， 2015.
24	FUJIMOTO S， VAN HOOF H， MEGER D. Addressing function approximation error in actor-critic methods［DB/OL］. arXiv preprint： 1802.09477， 2018.
25	HAARNOJA T， ZHOU A， HARTIKAINEN K， et al. Soft actor-critic algorithms and applications［DB/OL］. arXiv preprint： 1812.05905， 2018.
26	VINYALS O， BABUSCHKIN I， CZARNECKI W M， et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning［J］. Nature， 2019， 575： 350-354.
27	JADERBERG M， DALIBARD V， OSINDERO S， et al. Population based training of neural networks［DB/OL］. arXiv preprint： 1711.09846， 2017.
28	BANSAL T， PACHOCKI J， SIDOR S， et al. Emergent complexity via multi-agent competition［DB/OL］. arXiv preprint： 1710.03748， 2017.

红方策略	对抗胜率/%
红方策略	专家系统对抗训练	自博弈训练
智能体0	44	25
智能体1	38	20
智能体2	47	19
智能体3	20	85
智能体4	56	31
智能体5	88	22
智能体6	57	72
智能体7	41	33

[1]	Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024-331024.
[2]	Lingfeng JIANG, Xinkai LI, Hai ZHANG, Hanwei LI, Hongli ZHANG. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331035-331035.
[3]	Henghui LI, Qianhui LIN, Taofeng HAN, Yang HE. Close-range air combat model based on energy maneuverability and its applications [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(7): 330863-330863.
[4]	Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553-630553.
[5]	Chen WANG, Caisheng WEI, Zeyang YIN, Kai JIN, Xingchen LI. Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331837-331837.
[6]	Yizhe LUO, Hui ZHANG, Xinde YU, Zhao JIN, Shuo FENG, Yucheng SHI, Mingling XU. Hierarchical dynamic scheduling for multi-wave carrier-based aircraft ammunition support missions [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331945-331945.
[7]	Xiangsong HUANG, Mengyu WANG, Dapeng PAN. Adversarial reinforcement learning-based UAV escape path planning method [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(17): 331637-331637.
[8]	Yu WANG, Zhipeng XIE, Yongjian TIAN, Guanglei MENG. Distributed UAV formation control with virtual structure guided reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331354-331354.
[9]	Wei CHEN, Lulu LI, Dong CHEN, Shaohui ZHANG, Yafei LI, Ke WANG, Yuanyuan JIN, Mingliang XU. Multi-aircraft cooperative decision-making methods driven by differentiated support demands for carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531274-531274.
[10]	Xudong CHEN, Qiqi CHEN, Yizhe LUO, Jiabao WANG, Mingliang XU. Dynamic parallel scheduling of heterogeneous carrier-based aircraft deck support operations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531329-531329.
[11]	Zheng WANG, Hua WANG, Keke CUI, Chaochao LI, Junnan LIU, Mingliang XU. Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531333-531333.
[12]	Wenhui LING, Chunhui MU, Lingcong NIE, Xian DU, Ximing SUN. Improved DDPG-based multipoint pressure distribution control of variable geometry scramjet combustor at wide range velocities [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 131092-131092.
[13]	Zijie YU, Zheng ZHENG, Qingdong LI, Lin GUO, Suping REN, Jian GUO. Trajectory planning for solar-powered UAVs based on deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 331420-331420.
[14]	Changxiao ZHAO, Yixuan SUN. A safe scheduling model for eVTOL avionics systems for airworthiness requirements [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(11): 531252-531252.
[15]	Shuyi GAO, Defu LIN, Duo ZHENG, Cheng XU. Intelligent maneuvering penetration guidance strategies for aerial vehicles considering interceptor detection capability limitations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(10): 331304-331304.

Decision⁃making in close⁃range air combat based on reinforcement learning and population game

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 16

References 28

Related Articles 15

Recommended Articles

Metrics

Comments