基于深度强化学习的智能空战决策与仿真

doi:10.7527/S1000-6893.2022.26731

Abstract

Abstract:

Intelligent decision-making for aircraft air combat is a research hotspot of military powers in the world today. To solve the problem of Unmanned Aerial Vehicle （UAV） maneuvering decision-making in the close-range air combat game， an autonomous decision-making model based on deep reinforcement learning is proposed， where a reward function comprehensively considering the attack angle advantage， speed advantage， altitude advantage and distance advantage is adopted and improved. The improved reward function avoids the problem that the agent is induced to fall to the ground by the enemy aircraft， and can effectively guide the agent to converge to the optimal solution. Aiming at the problem of slow convergence caused by random sampling in reinforcement learning， we design a value-based prioritization method for experience pool samples. Under the premise of ensuring the algorithm convergence， the convergence speed of the algorithm is significantly accelerated. The decision-making model is verified based on the human-machine confrontation simulation platform， and the results show that the model can suppress the expert system and the driver in the process of close air combat.

Key words: air combat, independent decision-making, deep reinforcement learning, TD3 algorithm, sparse rewards

CLC Number:

V249.12

Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.

Figures/Tables 22

Fig. 1

Fig. 2

Fig. 3

Fig. 4

Fig. 5

Fig. 6

Fig. 7

Fig. 8

Fig. 9

Fig. 10

Fig. 11

Fig. 12

Fig. 13

Fig. 14

Fig. 15

Fig. 16

Fig. 17

Fig. 18

Fig. 19

Fig. 20

Fig. 21

Fig. 22

References 27

1	SILVER D， HUANG A， MADDISON C J， et al. Mastering the game of Go with deep neural networks and tree search［J］. Nature， 2016， 529（7587）： 484-489.
2	Defense Advanced Research Projects Agency. AlphaGogfight trials go virtual for final event ［EB/OL］. （2020-08-07）［2021-03-10］. ：.
3	孙智孝，杨晟琦，朴海音，等. 未来智能空战发展综述［J］. 航空学报， 2021， 42（8）： 525799.
	SUN Z X， YANG S Q， PIAO H Y， et al. A survey of air combat artificial intelligence［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525799 （in Chinese）.
4	PARK H， LEE B Y， TAHK M J， et al. Differential game based air combat maneuver generation using scoring function matrix［J］. International Journal of Aeronautical and Space Sciences， 2016， 17（2）： 204-213.
5	WEINTRAUB I E， PACHTER M， GARCIA E. An introduction to pursuit-evasion differential games［C］∥ 2020 American Control Conference （ACC）. Piscataway： IEEE Press， 2020： 1049-1066.
6	MCGREW J S. Real-time maneuvering decisions for autonomous air combat［D］. Cambridge： Massachusetts Institute of Technology， 2008： 91-104.
7	KANESHIGE J， KRISHNAKUMAR K. Artificial immune system approach for air combat maneuvering［C］∥Proceeding of the SPIE， 2007.
8	薛羽，庄毅，张友益，等. 基于启发式自适应离散差分进化算法的多UCAV协同干扰空战决策［J］. 航空学报， 2013， 34（2）： 343-351.
	XUE Y， ZHUANG Y， ZHANG Y Y， et al. Multiple UCAV cooperative jamming air combat decision making based on heuristic self-adaptive discrete differential evolution algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2013， 34（2）： 343-351 （in Chinese）.
9	BURGIN G H. Improvements to the adaptive maneuvering logic program： NASA CR 3985［R］. Washington， D.C.： NASA， 1986.
10	左家亮，杨任农，张滢，等. 基于启发式强化学习的空战机动智能决策［J］. 航空学报， 2017， 38（10）： 321168.
	ZUO J L， YANG R N， ZHANG Y， et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2017， 38（10）： 321168 （in Chinese）.
11	张耀中，许佳林，姚康佳，等. 基于DDPG算法的无人机集群追击任务［J］. 航空学报， 2020， 41（10）： 324000.
	ZHANG Y Z， XU J L， YAO K J， et al. Pursuit missions for UAV swarms based on DDPG algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（10）： 324000 （in Chinese）.
12	杜海文，崔明朗，韩统，等．基于多目标优化与强化学习的空战机动决策［J］.北京航空航天大学学报，2018， 44 （11）： 2247-2256.
	DU H W， CUI M L， HAN T， et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning ［J］. Journal of Beijing University of Aeronautics and Astronautics， 2018， 44（11）： 2247-2256 （in Chinese）.
13	施伟，冯旸赫，程光权，等. 基于深度强化学习的多机协同空战方法研究［J］. 自动化学报， 2021， 47（7）： 1610-1623.
	SHI W， FENG Y H， CHENG G Q， et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning［J］. Acta Automatica Sinica， 2021， 47（7）： 1610-1623 （in Chinese）.
14	张强，杨任农，俞利新，等. 基于Q-network强化学习的超视距空战机动决策［J］. 空军工程大学学报（自然科学版）， 2018， 19（6）： 8-14.
	ZHANG Q， YANG R N， YU L X， et al. BVR air combat maneuvering decision by using Q-network reinforcement learning［J］. Journal of Air Force Engineering University （Natural Science Edition）， 2018， 19（6）： 8-14 （in Chinese）.
15	李银通，韩统，孙楚，等. 基于逆强化学习的空战态势评估函数优化方法［J］. 火力与指挥控制， 2019， 44（8）： 101-106.
	LI Y T， HAN T， SUN C， et al. An optimization method of air combat situation assessment function based on inverse reinforcement learning［J］. Fire Control ＆ Command Control， 2019， 44（8）： 101-106 （in Chinese）.
16	SUTTON R S， BARTO A G. Reinforcement learning： an introduction［M］. 2nd ed. London： MIT Press， 2018.
17	HINTON G E， OSINDERO S， TEH Y W. A fast learning algorithm for deep belief nets［J］. Neural Computation， 2006， 18（7）： 1527-1554.
18	WATKINS C J C H， DAYAN P. Q-learning［J］. Machine Learning， 1992， 8（3）： 279-292.
19	RUMMERY G A， NIRANJAN M. On-line Q-learning using connectionist systems［M］. Cambridge：University of Cambridge， 1994.
20	SCHULMAN J， LEVINE S， MORITZ P， et al. Trust region policy optimization［C］∥ Proceedings of the 31st International Conference on Machine Learning， 2015： 1889-1897.
21	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［EB/OL］. 2017： arXiv： 1707.06347. .
22	KONDA V R， TSITSIKLIS J N. OnActor-critic algorithms［J］. SIAM Journal on Control and Optimization， 2003， 42（4）： 1143-1166.
23	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［C］∥4th International Conference on Learning Representations， ICLR 2016-Conference Track Proceedings， 2016.
24	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518（7540）： 529-533.
25	FUJIMOTO S， VAN HOOF H， MEGER D. Addressing function approximation error in actor-critic methods［C］∥Proceedings of the 35th International Conference on Machine Learning， 2018： 1587-1596.
26	魏航. 基于强化学习的无人机空中格斗算法研究［D］. 哈尔滨：哈尔滨工业大学， 2015： 42-43.
	WEI H. Research of UCAV air combat based on reinforcemnt learning［D］. Harbin： Harbin Institute of Technology， 2015： 42-43 （in Chinese）.
27	钟友武，柳嘉润，杨凌宇，等. 自主近距空战中机动动作库及其综合控制系统［J］. 航空学报， 2008， 29（S1）： 114-121.
	ZHONG Y W， LIU J R， YANG L Y， et al. Maneuver library and integrated control system for autonomous close-in air combat［J］. Acta Aeronautica et Astronautica Sinica， 2008， 29（1）： 114-121 （in Chinese）.

[1]	FU Xiaowei, WANG Hui, XU Zhe. Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(5): 325311-325311.
[2]	FAN Huitao, YAN Jun. Evolution and development trend of air combat system [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(10): 527397-527397.
[3]	SUN Cong. Development trend of future fighter: A review of evolution of winning mechanism in air combat [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(8): 525826-525826.
[4]	SUN Zhixiao, YANG Shengqi, PIAO Haiyin, BAI Chengchao, GE Jun. A survey of air combat artificial intelligence [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(8): 525799-525799.
[5]	REN Feng, GAO Chuanqiang, TANG Hui. Machine learning for flow control: Applications and development trends [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(4): 524686-524686.
[6]	XIANG Xiaojia, YAN Chao, WANG Chang, YIN Dong. Coordination control method for fixed-wing UAV formation through deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(4): 524009-524009.
[7]	ZHOU Kai, WEI Ruixuan, ZHANG Qirui, DING Chao. Learning method for autonomous air combat based on experience transfer [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(S2): 724285-724285.
[8]	DONG Yiqun, AI Jianliang. Decision making in autonomous air combat: Review and prospects [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(S2): 724264-724264.
[9]	YANG Wei. Development of future fighters [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(6): 524377-524377.
[10]	CHEN Bin, WANG Jiang, WANG Yang. Intelligent virtual training partner in embedded training system of fighter [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(6): 523467-523467.
[11]	LIU Bingyan, YE Xiongbing, ZHOU Chifei, LIU Biliu. Allocation of composite mode on-orbit service resource based on improved DQN [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(5): 323630-323630.
[12]	ZHANG Yaozhong, XU Jialin, YAO Kangjia, LIU Jieling. Pursuit missions for UAV swarms based on DDPG algorithm [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(10): 324000-324000.
[13]	LIU Bingyan, YE Xiongbing, GAO Yong, WANG Xinbo, NI Lei. Strategy solution of non-cooperative target pursuit-evasion game based on branching deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(10): 324040-324040.
[14]	ZHANG Jing, HE You, PENG Yingning, LI Gang. Neural network and artificial potential field based cooperative and adversarial path planning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2019, 40(3): 322493-322493.
[15]	ZUO Jialiang, YANG Rennong, ZHANG Ying, LI Zhonglin, WU Meng. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2017, 38(10): 321168-321168.

Intelligent air combat decision making and simulation based on deep reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 22

References 27

Related Articles 15

Recommended Articles

Metrics

Comments