基于混合动作的空战分层强化学习决策算法

doi:10.7527/S1000-6893.2024.30053

Abstract

Abstract:

Intelligent air combat is a hot research topic among countries with strong military power in the world. To solve the maneuver decision problem of air combat Beyond Visual Range （BVR）， we propose the hierarchical decision algorithm based on deep reinforcement learning. In the decision algorithm， we use the maneuver set appropriate to the BVR air combat to control the trajectory and the attitude of the aircraft. To expand the action space of the model and increase its decision-making ability， we hierarchize the action space and model it as the multi-discrete one. To solve the problem of sparse reward in air combat， we design a set of reward function taking into consideration the factors including the position advantage， weapon launching， and weapon threat， which can guide the agent to converge to the optimal policy. We also build a complete digital-twin simulation environment for air combat and an expert system. The decision algorithm is trained in the simulation environment， and is evaluated by fighting with the expert system. The experiment results indicate that the decision algorithm proposed has the ability to make autonomous and flexible decisions in BVR air combat based on current situations， and has some advantages against the expert system.

Key words: air combat beyond visual range, intelligent decision, deep reinforcement learning, proximal policy optimization, maneuver, hierarchical decision

CLC Number:

V249.4

Zuolong LI, Jihong ZHU, Minchi KUANG, Jie ZHANG, Jie REN. Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(17): 530053-530053.

Figures/Tables 21

Fig.1

Table 1

Table 2

Maneuver parameters in air combat

参数	数值
过载	2， 3， 4， 5， 6， 7
爬升率/ $(m ⋅ s - 1)$	0， 40， 80， 120， 160， 200
速度马赫数	0.7， 0.8， 0.9， 1.0， 1.1， 1.2， 1.3， 1.4
盘旋角速度/ $((°) ⋅ s - 1)$	0， 5， 10， 15， 20， 25， 30

Table 2

Table 3

Reward function in air combat

类型	名称	数值
事件奖励	命中目标	+100
	平局	-10
	被命中	-100
	坠地	-100
	扫描到敌机	+10
	近距离躲避敌机	+50
	近距离经过敌机	+10
	发射导弹	-6~-2
状态奖励	优势	$R a$
	威胁	$R t$
	失速	$R s$
	侧滑角过大	$R β$

Table 3

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Table 4

Hyperparameters of the algorithm

超参数	数值
折扣因子 $γ$	0.99
GAE参数 $λ$	0.95
裁剪系数 $ε$	0.1
策略熵系数 $β E$	0.01
策略网络学习率 $α a$	$5 × 10 - 4$
值网络学习率 $α c$	$2 × 10 - 4$
Batch_Size	120
序列长度	16
经验池样本容量	480

Table 4

Fig.7

Fig.8

Fig.9

Fig.10

Fig.11

Fig.12

Table 5

Fig.13

Fig.14

Fig.15

Fig.16

References 37

1	喻煌超，牛轶峰，王祥科. 无人机系统发展阶段和智能化趋势［J］. 国防科技， 2021， 42（3）： 18-24.
	YU H C， NIU Y F， WANG X K. Stages of development of Unmanned Aerial Vehicles［J］. National Defense Technology， 2021， 42（3）： 18-24 （in Chinese）.
2	ERNEST N， CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions［J］. Journal of Defense Management， 2016， 6（1）： 1000144.
3	POPE A P， IDE J S， MIĆOVIĆ D， et al. Hierarchical reinforcement learning for air-to-air combat［C］∥ 2021 International Conference on Unmanned Aircraft Systems （ICUAS）. Piscataway： IEEE Press， 2021： 275-284.
4	DINARDO G. Artificial intelligence flies XQ-58A Valkyrie drone ［EB/OL］（2023-08-03）［2023-12-15］. .
5	赵志忠，高正红，刘行伟，等. 用攻击点推移速率评估一对一超视距空战效能［J］. 系统仿真学报， 2005， 17（12）： 2855-2857， 2862.
	ZHAO Z Z， GAO Z H， LIU X W， et al. Using shooting point stepping pace for evaluating one-versus-one BVR combat effectiveness［J］. Acta Simulata Systematica Sinica， 2005， 17（12）： 2855-2857， 2862 （in Chinese）.
6	杜海文，崔明朗，韩统，等. 基于多目标优化与强化学习的空战机动决策［J］. 北京航空航天大学学报， 2018， 44（11）： 2247-2256.
	DU H W， CUI M L， HAN T， et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning［J］. Journal of Beijing University of Aeronautics and Astronautics， 2018， 44（11）： 2247-2256 （in Chinese）.
7	AUSTIN F， CARBONE G， FALCO M， et al. Automated maneuvering decisions for air-to-air combat［C］∥ Proceedings of the Guidance， Navigation and Control Conference. Reston： AIAA， 1987：2393.
8	ISAACS R. Differential games： A mathematical theory with applications to warfare and pursuit， control and optimization［M］. Mineola： Dover Publications， 1999.
9	HUANG C Q， DONG K S， HUANG H Q， et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization［J］. Journal of Systems Engineering and Electronics， 2018， 29（1）： 86-97.
10	BURGIN G H， OWENS A J. An adaptive maneuvering logic computer program for the simulation of one-to-one air-to-air combat. Volume 2： Program description：NASA-CR-2583 ［R］. Washington， D. C.：NASA， 1975.
11	SUN Z X， PIAO H Y， YANG Z， et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play［J］. Engineering Applications of Artificial Intelligence， 2021， 98： 104112.
12	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518： 529-533.
13	SILVER D， HUANG A， MADDISON C J， et al. Mastering the game of Go with deep neural networks and tree search［J］. Nature， 2016， 529： 484-489.
14	BERNER C， BROCKMAN G， CHAN B， et al. Dota2 with large scale deep reinforcement learning［DB/OL］. arXiv preprint： 1912.06680，2019.
15	章胜，周攀，何扬，等. 基于深度强化学习的空战机动决策试验［J］. 航空学报， 2023， 44（10）： 128094.
	ZHANG S， ZHOU P， HE Y， et al. Air combat maneuver decision-making test based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（10）： 128094 （in Chinese）.
16	张建东，王鼎涵，杨啟明，等. 基于分层强化学习的无人机空战多维决策［J］. 兵工学报， 2023， 44（6）： 1547-1563.
	ZHANG J D， WANG D H， YANG Q M， et al. Multi-dimensional decision-making for UAV air combat based on hierarchical reinforcement learning［J］. Acta Armamentarii， 2023， 44（6）： 1547-1563 （in Chinese）.
17	邱妍，赵宝奇，邹杰，等. 基于PPO算法的无人机近距空战自主引导方法［J］. 电光与控制， 2023， 30（1）： 8-14.
	QIU Y， ZHAO B Q， ZOU J， et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm［J］. Electronics Optics & Control， 2023， 30（1）： 8-14 （in Chinese）.
18	钱殿伟，齐红敏，刘振，等. 基于改进近端策略优化的空战自主决策研究［J/OL］. 系统仿真学报，（2023-07-20）［2024-01-01］. .
	QIAN D W， QI H M， LIU Z， et al. Research on autonomous decision-making in air-combat based on improved proximal policy optimization［J/OL］. Journal of System Simulation，（2023-07-20）［2024-01-01］. （in Chinese）.
19	BARTO A G. Reinforcement learning［M］∥OMIDVAR O， ELLIOTT D L. Neural Systems for Control. Amsterdam： Elsevier， 1997： 7-30.
20	SUTTON R S， MCALLESTER D， SINGH S， et al. Policy gradient methods for reinforcement learning with function approximation［C］∥ Proceedings of the 12th International Conference on Neural Information Processing Systems. New York： ACM， 1999： 1057–1063.
21	SCHULMAN J， LEVINE S， MORITZ P， et al. Trust region policy optimization［C］∥Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. New York：ACM，2015：1889-1897.
22	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［DB/OL］. arXiv preprint：1707.06347，2017.
23	HAARNOJA T， ZHOU A， HARTIKAINEN K， et al. Soft actor-critic algorithms and applications［DB/OL］. arXiv preprint： 1812.05905，2018.
24	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［DB/OL］. arXiv preprint ：1509.02971， 2015.
25	FUJIMOTO S， VAN HOOF H， MEGER D. Addressing function approximation error in actor-critic methods［C］∥ Proceedings of the 35th International Conference on Machine Learning，2018： 1587-1596.
26	SCHULMAN J， MORITZ P， LEVINE S， et al. High-dimensional continuous control using generalized advantage estimation［DB/OL］. arXiv preprint：1506.02438， 2015.
27	ENGSTROM L， ILYAS A， SANTURKAR S， et al. Implementation matters in deep policy gradients： A case study on PPO and TRPO［DB/OL］. arXiv preprint：2005.12729， 2020.
28	ZHU J Y， KUANG M C， ZHOU W Q， et al. Mastering air combat game with deep reinforcement learning［J］. Defence Technology， 2024， 34： 295-312.
29	王宝来，高显忠，谢涛，等.基于强化学习与种群博弈的近距空战决策研究［J/OL］.航空学报，（2023-11-02）［2024-01-01］. .
	WANG B L， GAO X Z， XIE T， et al. Research on decision-making in close-range air combat based on reinforcement learning and population game［J/OL］. Acta Aeronautica et Astronautica Sinica，（2023-11-02）［2024-01-01］. （in Chinese）.
30	张婷玉，孙明玮，王永帅，等. 基于深度Q网络的近距空战智能机动决策研究［J］. 航空兵器， 2023， 30（3）： 41-48.
	ZHANG T Y， SUN M W， WANG Y S， et al. Research on intelligent maneuvering decision-making in close air combat based on deep Q network［J］. Aero Weaponry， 2023， 30（3）： 41-48 （in Chinese）.
31	ZHANG H P， WEI Y J， ZHOU H， et al. Maneuver decision-making for autonomous air combat based on FRE-PPO［J］. Applied Sciences， 2022， 12（20）： 10230.
32	杨晟琦，田明俊，司迎利，等. 基于分层强化学习的无人机机动决策［J］. 火力与指挥控制， 2023， 48（8）： 48-52， 59.
	YANG S Q， TIAN M J， SI Y L， et al. Research on UAV maneuver decision-making based on hierarchical reinforcement learning［J］. Fire Control & Command Control， 2023， 48（8）： 48-52， 59 （in Chinese）.
33	钟友武，柳嘉润，杨凌宇，等. 自主近距空战中机动动作库及其综合控制系统［J］. 航空学报， 2008， 29（S1）： 114-121.
	ZHONG Y W， LIU J R， YANG L Y， et al. Maneuver library and integrated control system for autonomous close-in air combat ［J］. Acta Aeronautica et Astronautica Sinica， 2008， 29（S1）： 114-121 （in Chinese）.
34	NG A Y， HARADA D， RUSSELL S J. Policy invariance under reward transformations： theory and application to reward shaping［C］∥ Proceedings of the Sixteenth International Conference on Machine Learning. New York： ACM， 1999：278-287.
35	祝靖宇，张宏立，匡敏驰，等.稀疏奖励下基于课程学习的无人机空战仿真［J］.系统仿真学报，2024，36（6）：1452-1467.
	ZHU J Y， ZHANG H L， KUANG M C， et al. Curriculum learning based simulation of UAV air combat under sparse rewards［J］. Journal of System Simulation， 2024，36（6）：1452-1467 （in Chinese）.
36	周文卿，朱纪洪，匡敏驰. 一种基于群体智能的无人空战系统［J］. 中国科学：信息科学， 2020， 50（3）： 363-374.
	ZHOU W Q， ZHU J H， KUANG M C. An unmanned air combat system based on swarm intelligence［J］. Scientia Sinica （Informationis）， 2020， 50（3）： 363-374 （in Chinese）.
37	FAN Z， SU R， ZHANG W N， et al. Hybrid actor-critic reinforcement learning in parameterized action space［DB/OL］. arXiv preprint： 1903.01344，2019.

序号	动作	参数
0	拉起	过载，爬升率
1	追踪	速度
2	攻击	速度
3	盘旋	过载，盘旋角速度
4	急盘旋	过载，盘旋角速度
5	筋斗	过载，速度
6	平飞	速度

[1]	Honglin ZHANG, Jianjun LUO, Weihua MA. Spacecraft game decision making for threat avoidance of space targets based on machine learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329136-329136.
[2]	Yunpeng CAI, Dapeng ZHOU, Jiangchuan DING. Intelligent collaborative control of UAV swarms with collision avoidance safety constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529683-529683.
[3]	Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723-328723.
[4]	Sai ZHANG, Zhen YANG, Xiangnan DU, Yazhong LUO. Threat avoidance strategy of spacecraft maneuvering approach based on orbital reachable domain [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328778-328778.
[5]	Wentao LI, Feng FANG, Zhenya WANG, Yichao ZHU, Dongliang PENG. Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(17): 529460-529460.
[6]	Tiancai WU, Honglun WANG, Bin REN, Yiheng LIU, Xingyu WU, Guocheng YAN. Learning-based integrated fault-tolerant guidance and control for hypersonic vehicles considering avoidance and penetration [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(15): 329607-329607.
[7]	Baolai WANG, Xianzhong GAO, Tao XIE, Zhongxi HOU. Decision⁃making in close⁃range air combat based on reinforcement learning and population game [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(12): 329446-329446.
[8]	Ming LU, Xueqin CHEN, Fan WU, Xibin CAO. Attitude maneuver control of spacecraft based on second⁃order fully actuated system under attitude constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(1): 628958-628958.
[9]	Xuejian WANG, Yongming WEN, Xiaorong SHI, Ningning ZHANG, Jiexi LIU. Design of hybrid intelligent decision framework for multi⁃agent and multi⁃coupling tasks [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729770-729770.
[10]	Xiaowei FU, Zhe XU, Jindong ZHU, Nan WANG. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3 [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(7): 327083-327083.
[11]	Xizhen GAO, Liang TANG, Huang HUANG. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(6): 26762-026762.
[12]	Xiaoyu LIU, Liguo SUN, Wenqian TAN, Jinpeng WEI, Weijun WANG, Junkai JIAO. Modeling and evaluation of carrier aircraft pilots based on similar configuration decisions [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126329-126329.
[13]	Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.
[14]	Yupeng FU, Xiangyang DENG, Ziqiang ZHU, Limin ZHANG. Value-filter based air-combat maneuvering optimization [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 628871-628871.
[15]	Yiting TAN, Wuxing JING, Changsheng GAO, Ruoming AN. Multiple constrained analytical capture region for hypersonic maneuvering target interception [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 328436-328436.

Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 21

References 37

Related Articles 15

Recommended Articles

Metrics

Comments

奖励	胜率/%	负率/%	平局率/%
完整奖励	62	30	8
去除优势奖励	45	47	8
去除威胁奖励	37	62	1
去除躲避奖励	45	38	6
去除扫描奖励	57	39	4