基于自博弈深度强化学习的空战智能决策方法

doi:10.7527/S1000-6893.2023.28723

Abstract

Abstract:

Air combat is an important element in the three-dimensional nature of war， and intelligent air combat has become a hotspot and focus of research in the military field both domestically and internationally. Deep reinforcement learning is an important technological approach to achieving air combat intelligence. To address the challenge of constructing high-level opponents in single agent training method， a self-play based air combat agent training method is proposed， and a visualization research platform is built to develop a decision-making agent for close-range air combat. The field knowledge of pilots is embedded in the design process of the agent’s observation， action， and reward， training the agent to convergence. Simulation experiments show that the air combat tactics of agent gradually improves by self-play training， achieving a win rate of over 70% against the decision making by single agent training and the emerging of the strategies similar to human “single/double loop” tactics.

Key words: air combat, artificial intelligence, deep reinforcement learning, self-play, agent

CLC Number:

V249

Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723-328723.

Figures/Tables 21

Fig.1

Fig.2

Fig.3

Table 1

Fig.4

Fig.5

Fig.6

Table 2

Fig.7

Table 3

Fig.8

Fig.9

Fig.10

Fig.11

Table 4

Fig.12

Table 5

Fig.13

Fig.14

Fig.15

Fig.16

References 41

1	杨伟. 关于未来战斗机发展的若干讨论［J］. 航空学报， 2020， 41（6）： 524377.
	YANG W. Development of future fighters［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（6）： 524377 （in Chinese）.
2	Defense Advanced Research Projects Agency. Alpha dog fight trials go virtual for final event［EB/OL］. （2020-08-07）［2021-03-10］. ：.
3	董一群，艾剑良. 自主空战技术中的机动决策：进展与展望［J］. 航空学报， 2020， 41（S2）： 724264.
	DONG Y Q， AI J L. Decision making in autonomous air combat： review and prospects［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（S2）： 724264 （in Chinese）.
4	SILVER D， HUANG A， MADDISON C J， et al. Mastering the game of Go with deep neural networks and tree search［J］. Nature， 2016， 529（7587）： 484-489.
5	SILVER D， SCHRITTWIESER J， SIMONYAN K， et al. Mastering the game of go without human knowledge［J］. Nature， 2017， 550（7676）： 354-359.
6	SILVER D， HUBERT T， SCHRITTWIESER J， et al. A general reinforcement learning algorithm that masters chess， shogi， and Go through self-play［J］. Science， 2018， 362（6419）： 1140-1144.
7	JUMPER J， EVANS R， PRITZEL A， et al. Highly accurate protein structure prediction with AlphaFold［J］. Nature， 2021， 596（7873）： 583-589.
8	FAWZI A， BALOG M， HUANG A， et al. Discovering faster matrix multiplication algorithms with reinforcement learning［J］. Nature， 2022， 610（7930）： 47-53.
9	SILVER D， SINGH S， PRECUP D， et al. Reward is enough［J］. Artificial Intelligence， 2021， 299： 103535.
10	VINYALS O， BABUSCHKIN I， CZARNECKI W M， et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning［J］. Nature， 2019， 575（7782）： 350-354.
11	VINYALS O， EWALDS T， BARTUNOV S， et al. StarCraft II： A new challenge for reinforcement learning［DB/OL］. 2017：arXiv preprint：1708.04782.
12	OpenAI. OpenAI five［EB/OL］. 2018. .
13	BAKER B， KANITSCHEIDER I， MARKOV T M， et al. Emergent tool use from multi-agent autocurricula［DB/OL］. arXiv preprint：1909.07528， 2020.
14	OH I， RHO S， MOON S， et al. Creating pro-level AI for a real-time fighting game using deep reinforcement learning［J］. IEEE Transactions on Games， 2022， 14（2）： 212-220.
15	KURNIAWAN B， VAMPLEW P， PAPASIMEON M， et al. An empirical study of reward structures for actor-critic reinforcement learning in air combat manoeuvring simulation［C］∥ Australasian Joint Conference on Artificial Intelligence. Cham： Springer， 2019： 54-65.
16	YANG Q M， ZHU Y， ZHANG J D， et al. UAV air combat autonomous maneuver decision based on DDPG algorithm［C］∥ 2019 IEEE 15th International Conference on Control and Automation （ICCA）. Piscataway： IEEE Press， 2019： 37-42.
17	YANG Q M， ZHANG J D， SHI G Q， et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning［J］. IEEE Access， 2019， 8： 363-378.
18	PIAO H Y， SUN Z X， MENG G L， et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning［C］∥ 2020 International Joint Conference on Neural Networks （IJCNN）. Piscataway： IEEE Press， 2020： 1-8.
19	单圣哲，杨孟超，张伟伟，等. 自主空战连续决策方法［J］. 航空工程进展， 2022， 13（5）： 47-58.
	SHAN S Z， YANG M C， ZHANG W W， et al. Continuous decision-making method for autonomous air combat［J］. Advances in Aeronautical Science and Engineering， 2022， 13（5）： 47-58 （in Chinese）.
20	SUTTON R S， BARTO A G. Reinforcement learning： An introduction［M］. 2nd Ed.Cambridge： MIT Press， 2018.
21	MATHEW A， AMUDHA P， SIVAKUMARI S. Deep learning techniques： an overview［C］∥International Conference on Advanced Machine Learning Technologies and Applications. Singapore： Springer， 2021： 599-608.
22	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Playing atari with deep reinforcement learning［DB/OL］. arXiv preprint： 1312.5602， 2013.
23	Github. Unity technologies［EB/OL］.（2022-12-14）. .
24	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms ［DB/OL］. arXiv preprint： 1707.06347， 2017.
25	VON NEUMANN J， MORGENSTERN O. Theory of games and economic behavior： 60th anniversary commemorative edition［M］. Princeton： Princeton University Press， 2007.
26	SHAPLEY L S. Stochastic games［J］. Proceedings of the National Academy of Sciences of the United States of America， 1953， 39（10）： 1095-1100.
27	LITTMAN M L. Markov games as a framework for multi-agent reinforcement learning［M］∥KAUFMANN M. Machine learning proceedings. Amsterdam： Elsevier， 1994： 157-163.
28	BROWN G W. Iterative solution of games by fictitious play［J］. Activity Analysis of Production and Allocation， 1951， 13（1）： 374-376.
29	SCHRITTWIESER J， ANTONOGLOU I， HUBERT T， et al. Mastering Atari， Go， chess and shogi by planning with a learned model［J］. Nature， 2020， 588（7839）： 604-609.
30	ZHA D C， XIE J R， MA W Y， et al. DouZero： Mastering DouDizhu with self-play deep reinforcement learning［DB/OL］. arXiv preprint： 2106.06135， 2021.
31	BANSAL T， PACHOCKI J， SIDOR S， et al. Emergent complexity via multi-agent competition［DB/OL］. arXiv preprint： 1710.03748， 2017.
32	JADERBERG M， CZARNECKI W M， DUNNING I， et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning［J］. Science， 2019， 364（6443）： 859-865.
33	JULIANI A， BERGES V P， VCKAY E， et al. Unity： a general platform for intelligent agentsV［DB/OL］. arXiv preprint： 1809.02627， 2020.
34	BONANNI P. The art of the kill： A comprehensive guide to modern air combat［M］. Boulder： Spectrum HoloByte， 1993.
35	吴文海，周思羽，高丽，等. 基于导弹攻击区的超视距空战态势评估改进［J］. 系统工程与电子技术， 2011， 33（12）： 2679-2685.
	WU W H， ZHOU S Y， GAO L， et al. Improvements of situation assessment for beyond-visual-range air combat based on missile launching envelope analysis［J］. Systems Engineering and Electronics， 2011， 33（12）： 2679-2685 （in Chinese）.
36	YANG Y D， WANG J. An overview of multi-agent reinforcement learning from game theoretical perspective［DB/OL］. arXiv preprint： 2011.00583v3， 2021.
37	SCHULMAN J， MORITZ P， LEVINE S， et al. High-dimensional continuous control using generalized advantage estimation［DB/OL］. arXiv preprint： 1506.02438， 2015.
38	Technologies Unity. Unity ML-agents toolkit［EB/OL］. （2023-07-10）.
39	JADERBERG M， CZARNECKI W M， DUNNING I， et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning［J］. Science， 2019， 364（6443）： 859-865.
40	Wikipedia. Elo rating system［EB/OL］. 2021. .
41	Github.NWPU-SSZ［EB/OL］.（2023-08-28）. .

子模式	水平范围/（°）	俯仰范围/（°）	扫描时间/s
HS	-10~10	-15~5	3
VS	-5~5	-10~30	3
BS	-1~1	-1.5~-0.5	0.01

扫描发射方式	解锁时机	搜索范围
定瞄定发	不解锁	2°锥角
定瞄离发	截获后解锁	2°锥角
定扫离发	截获前解锁	5°锥角
随动离发	雷达截获目标后解锁	水平和垂直±20°方形区域内雷达随动

分类	特征名称	数值界限	维度
我机坐标	东西坐标/m	（0，30 000）	1
	南北坐标/m	（0，30 000）	1
	飞行高度/m	（0，12 000）	1
飞行状态	飞行空速/（m·s^-1）	（0，500）	2
	飞行表速/（m·s^-1）	（0，500）	2
	马赫数	（0，1.6）	2
	纵向过载	（-1，2）	2
	法向过载	（-4，9）	2
	侧向过载	（-1，1）	2
	转弯角速率/（（°）·s^-1）	（0，50）*	2
	姿态四元数	（-1，1）	8
几何态势	敌我距离向量/m	（-8 000，8 000）*	4
	敌我速度向量/m	（-500，500）*	6
	敌我高度差/m	（-1 000，1 000）*	1
	机炮瞄准系数	（0，1）	2
	水平离轴角/（°）	（-180，180）	2
	离轴角/（°）	（-90，90）	2
	雷达扫描范围/（°）	（-20，20）	8
	导弹扫描范围/（°）	（-20，20）	8
	进入角/（°）	（0，180）	2
	天线偏角/（°）	（0，180）	2
	导弹最大距离/m	（0，8 000）*	2
	导弹最小距离/m	（0，3 000）*	2
	敌我距离标量/m	（0，8 000）*	1
总计			67

奖励类型	博弈分类	奖励名称	权重分配		奖励特性
奖励类型	博弈分类	奖励名称	我方	敌方	奖励特性
结果奖励	零和博弈	导弹杀敌	1	-1	稀疏
		机炮杀敌	1	-1
		敌机撞地	1	-1
		飞出边界	-1	1
		相撞/互杀	0	0
事件奖励	零和博弈	雷达照射	0.05	-0.05	稀疏
		雷达锁定	0.2	-0.2
		导弹锁敌	0.3	-0.3
		机炮瞄准	0.5	-0.5
		达成发射	0.55	-0.55
过程奖励	零和博弈	角度优势	0.005	-0.005	稠密
	零和博弈	能量优势	0.008	-0.008
	相同利益	距离奖励	-0.000 1	-0.000 1
	相同利益	高度差奖励	-0.000 1	-0.000 1
边界奖励	零和博弈	控制区	0.1	-0.1	连续
	非博弈	空域坐标	0.8	0.8
		飞行马赫数	0.8	0.8
		飞行表速	0.8	0.8
		双机距离	0.8	0.8

名称	超参1	超参2	超参3
群落人口M	100	100	200
博弈概率ε	0.5	0.8	0.5
保存间隔u	2×10³	2×10³	4×10³
重置对手间隔v	100	100	100
训练切换间隔w	2×10⁵	2×10⁵	4×10⁵

Air combat intelligent decision-making method based on self-play and deep reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 21

References 41

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	Xuejian WANG, Yongming WEN, Xiaorong SHI, Ningning ZHANG, Jiexi LIU. Design of hybrid intelligent decision framework for multi⁃agent and multi⁃coupling tasks [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729770-729770.
[2]	Youpeng DENG, Jiaxuan FAN, Yan ZHENG, Zhenya WANG, Yongliang LYU, Yuxiao LI. Multiagent opponent modeling with incompleted information [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729782-729782.
[3]	Jinyi MA, Can WANG, Tao XUE, Jianliang AI, Yiqun DONG. Development and illustrative applications of an air combat engagement database [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727538-727538.
[4]	Zhilin FAN, Hongyong YANG, Yilin HAN. Target round-up control for multi-agent systems based on reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727487-727487.
[5]	Lianbo YU, Pinzhao CAO, Liang SHI, Jie LIAN, Dong WANG. An improved conflict⁃based search algorithm for multi⁃agent path planning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727648-727648.
[6]	Baichuan ZHANG, Wenhao BI, An ZHANG, Zeming MAO, Mi YANG. Transformer-based error compensation method for air combat aircraft trajectory prediction [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(9): 327413-327413.
[7]	Yajie MA, Juan WANG, Bin JIANG, Jianye GONG. A fault⁃tolerant control scheme for UAVs-UGVs formation systems [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(8): 327216-327216.
[8]	Xiaowei FU, Zhe XU, Jindong ZHU, Nan WANG. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3 [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(7): 327083-327083.
[9]	Xizhen GAO, Liang TANG, Huang HUANG. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(6): 26762-026762.
[10]	Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.
[11]	Linkun HE, Wenchao XUE, Ran ZHANG, Huifeng LI. Guidance and control for powered descent and landing of launch vehicles: Overview and outlook [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(23): 628462-628462.
[12]	Xiangwei ZHU, Dan SHEN, Kai XIAO, Yuexin MA, Xiang LIAO, Fuqiang GU, Fangwen YU, Kefu GAO, Jingnan LIU. Mechanisms, algorithms, implementation and perspectives of brain⁃inspired navigation [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 28569-028569.
[13]	Shuyi GAO, Defu LIN, Duo ZHENG, Xinyu HU. Intelligent cooperative interception strategy of aircraft against cluster attack [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(18): 328301-328301.
[14]	Yuwei LIU, Yuqiang CHENG, Jianjun WU. Research progress of intelligent control methods in space propulsion systems [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(15): 528505-528505.
[15]	Lei DONG, Hongbing CHEN, Xi CHEN, Changxiao ZHAO. Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327895-327895.