Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning

Zuolong LI; Jihong ZHU; Minchi KUANG; Jie ZHANG; Jie REN

doi:10.7527/S1000-6893.2024.30053

ACTA AERONAUTICAET ASTRONAUTICA SINICA >

2024 , Vol. 45 >Issue 17: 530053 - 530053

DOI: https://doi.org/10.7527/S1000-6893.2024.30053

Articles

Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning

Zuolong LI ,
Jihong ZHU ,
Minchi KUANG ,
Jie ZHANG ,
Jie REN

Expand

^1.Department of Precision Instrument，Tsinghua University，Beijing 100084，China
^2.AVIC Chengdu Flight Design and Research Institute，Chengdu 610091，China

E-mail： jhzhu@tsinghua.edu.cn

Received date: 2024-01-02

Revised date: 2024-01-11

Accepted date: 2024-04-22

Online published: 2024-04-25

Fold

Abstract

Intelligent air combat is a hot research topic among countries with strong military power in the world. To solve the maneuver decision problem of air combat Beyond Visual Range （BVR）， we propose the hierarchical decision algorithm based on deep reinforcement learning. In the decision algorithm， we use the maneuver set appropriate to the BVR air combat to control the trajectory and the attitude of the aircraft. To expand the action space of the model and increase its decision-making ability， we hierarchize the action space and model it as the multi-discrete one. To solve the problem of sparse reward in air combat， we design a set of reward function taking into consideration the factors including the position advantage， weapon launching， and weapon threat， which can guide the agent to converge to the optimal policy. We also build a complete digital-twin simulation environment for air combat and an expert system. The decision algorithm is trained in the simulation environment， and is evaluated by fighting with the expert system. The experiment results indicate that the decision algorithm proposed has the ability to make autonomous and flexible decisions in BVR air combat based on current situations， and has some advantages against the expert system.

Key words： air combat beyond visual range; intelligent decision; deep reinforcement learning; proximal policy optimization; maneuver; hierarchical decision

Cite this article

Zuolong LI , Jihong ZHU , Minchi KUANG , Jie ZHANG , Jie REN . Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(17) : 530053 -530053 . DOI: 10.7527/S1000-6893.2024.30053

References

1	喻煌超，牛轶峰，王祥科. 无人机系统发展阶段和智能化趋势［J］. 国防科技， 2021， 42（3）： 18-24.
	YU H C， NIU Y F， WANG X K. Stages of development of Unmanned Aerial Vehicles［J］. National Defense Technology， 2021， 42（3）： 18-24 （in Chinese）.
2	ERNEST N， CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions［J］. Journal of Defense Management， 2016， 6（1）： 1000144.
3	POPE A P， IDE J S， MI?OVI? D， et al. Hierarchical reinforcement learning for air-to-air combat［C］∥ 2021 International Conference on Unmanned Aircraft Systems （ICUAS）. Piscataway： IEEE Press， 2021： 275-284.
4	DINARDO G. Artificial intelligence flies XQ-58A Valkyrie drone ［EB/OL］（2023-08-03）［2023-12-15］. .
5	赵志忠，高正红，刘行伟，等. 用攻击点推移速率评估一对一超视距空战效能［J］. 系统仿真学报， 2005， 17（12）： 2855-2857， 2862.
	ZHAO Z Z， GAO Z H， LIU X W， et al. Using shooting point stepping pace for evaluating one-versus-one BVR combat effectiveness［J］. Acta Simulata Systematica Sinica， 2005， 17（12）： 2855-2857， 2862 （in Chinese）.
6	杜海文，崔明朗，韩统，等. 基于多目标优化与强化学习的空战机动决策［J］. 北京航空航天大学学报， 2018， 44（11）： 2247-2256.
	DU H W， CUI M L， HAN T， et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning［J］. Journal of Beijing University of Aeronautics and Astronautics， 2018， 44（11）： 2247-2256 （in Chinese）.
7	AUSTIN F， CARBONE G， FALCO M， et al. Automated maneuvering decisions for air-to-air combat［C］∥ Proceedings of the Guidance， Navigation and Control Conference. Reston： AIAA， 1987：2393.
8	ISAACS R. Differential games： A mathematical theory with applications to warfare and pursuit， control and optimization［M］. Mineola： Dover Publications， 1999.
9	HUANG C Q， DONG K S， HUANG H Q， et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization［J］. Journal of Systems Engineering and Electronics， 2018， 29（1）： 86-97.
10	BURGIN G H， OWENS A J. An adaptive maneuvering logic computer program for the simulation of one-to-one air-to-air combat. Volume 2： Program description：NASA-CR-2583 ［R］. Washington， D. C.：NASA， 1975.
11	SUN Z X， PIAO H Y， YANG Z， et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play［J］. Engineering Applications of Artificial Intelligence， 2021， 98： 104112.
12	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518： 529-533.
13	SILVER D， HUANG A， MADDISON C J， et al. Mastering the game of Go with deep neural networks and tree search［J］. Nature， 2016， 529： 484-489.
14	BERNER C， BROCKMAN G， CHAN B， et al. Dota2 with large scale deep reinforcement learning［DB/OL］. arXiv preprint： 1912.06680，2019.
15	章胜，周攀，何扬，等. 基于深度强化学习的空战机动决策试验［J］. 航空学报， 2023， 44（10）： 128094.
	ZHANG S， ZHOU P， HE Y， et al. Air combat maneuver decision-making test based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（10）： 128094 （in Chinese）.
16	张建东，王鼎涵，杨啟明，等. 基于分层强化学习的无人机空战多维决策［J］. 兵工学报， 2023， 44（6）： 1547-1563.
	ZHANG J D， WANG D H， YANG Q M， et al. Multi-dimensional decision-making for UAV air combat based on hierarchical reinforcement learning［J］. Acta Armamentarii， 2023， 44（6）： 1547-1563 （in Chinese）.
17	邱妍，赵宝奇，邹杰，等. 基于PPO算法的无人机近距空战自主引导方法［J］. 电光与控制， 2023， 30（1）： 8-14.
	QIU Y， ZHAO B Q， ZOU J， et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm［J］. Electronics Optics & Control， 2023， 30（1）： 8-14 （in Chinese）.
18	钱殿伟，齐红敏，刘振，等. 基于改进近端策略优化的空战自主决策研究［J/OL］. 系统仿真学报，（2023-07-20）［2024-01-01］. .
	QIAN D W， QI H M， LIU Z， et al. Research on autonomous decision-making in air-combat based on improved proximal policy optimization［J/OL］. Journal of System Simulation，（2023-07-20）［2024-01-01］. （in Chinese）.
19	BARTO A G. Reinforcement learning［M］∥OMIDVAR O， ELLIOTT D L. Neural Systems for Control. Amsterdam： Elsevier， 1997： 7-30.
20	SUTTON R S， MCALLESTER D， SINGH S， et al. Policy gradient methods for reinforcement learning with function approximation［C］∥ Proceedings of the 12th International Conference on Neural Information Processing Systems. New York： ACM， 1999： 1057–1063.
21	SCHULMAN J， LEVINE S， MORITZ P， et al. Trust region policy optimization［C］∥Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. New York：ACM，2015：1889-1897.
22	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［DB/OL］. arXiv preprint：1707.06347，2017.
23	HAARNOJA T， ZHOU A， HARTIKAINEN K， et al. Soft actor-critic algorithms and applications［DB/OL］. arXiv preprint： 1812.05905，2018.
24	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［DB/OL］. arXiv preprint ：1509.02971， 2015.
25	FUJIMOTO S， VAN HOOF H， MEGER D. Addressing function approximation error in actor-critic methods［C］∥ Proceedings of the 35th International Conference on Machine Learning，2018： 1587-1596.
26	SCHULMAN J， MORITZ P， LEVINE S， et al. High-dimensional continuous control using generalized advantage estimation［DB/OL］. arXiv preprint：1506.02438， 2015.
27	ENGSTROM L， ILYAS A， SANTURKAR S， et al. Implementation matters in deep policy gradients： A case study on PPO and TRPO［DB/OL］. arXiv preprint：2005.12729， 2020.
28	ZHU J Y， KUANG M C， ZHOU W Q， et al. Mastering air combat game with deep reinforcement learning［J］. Defence Technology， 2024， 34： 295-312.
29	王宝来，高显忠，谢涛，等.基于强化学习与种群博弈的近距空战决策研究［J/OL］.航空学报，（2023-11-02）［2024-01-01］. .
	WANG B L， GAO X Z， XIE T， et al. Research on decision-making in close-range air combat based on reinforcement learning and population game［J/OL］. Acta Aeronautica et Astronautica Sinica，（2023-11-02）［2024-01-01］. （in Chinese）.
30	张婷玉，孙明玮，王永帅，等. 基于深度Q网络的近距空战智能机动决策研究［J］. 航空兵器， 2023， 30（3）： 41-48.
	ZHANG T Y， SUN M W， WANG Y S， et al. Research on intelligent maneuvering decision-making in close air combat based on deep Q network［J］. Aero Weaponry， 2023， 30（3）： 41-48 （in Chinese）.
31	ZHANG H P， WEI Y J， ZHOU H， et al. Maneuver decision-making for autonomous air combat based on FRE-PPO［J］. Applied Sciences， 2022， 12（20）： 10230.
32	杨晟琦，田明俊，司迎利，等. 基于分层强化学习的无人机机动决策［J］. 火力与指挥控制， 2023， 48（8）： 48-52， 59.
	YANG S Q， TIAN M J， SI Y L， et al. Research on UAV maneuver decision-making based on hierarchical reinforcement learning［J］. Fire Control & Command Control， 2023， 48（8）： 48-52， 59 （in Chinese）.
33	钟友武，柳嘉润，杨凌宇，等. 自主近距空战中机动动作库及其综合控制系统［J］. 航空学报， 2008， 29（S1）： 114-121.
	ZHONG Y W， LIU J R， YANG L Y， et al. Maneuver library and integrated control system for autonomous close-in air combat ［J］. Acta Aeronautica et Astronautica Sinica， 2008， 29（S1）： 114-121 （in Chinese）.
34	NG A Y， HARADA D， RUSSELL S J. Policy invariance under reward transformations： theory and application to reward shaping［C］∥ Proceedings of the Sixteenth International Conference on Machine Learning. New York： ACM， 1999：278-287.
35	祝靖宇，张宏立，匡敏驰，等.稀疏奖励下基于课程学习的无人机空战仿真［J］.系统仿真学报，2024，36（6）：1452-1467.
	ZHU J Y， ZHANG H L， KUANG M C， et al. Curriculum learning based simulation of UAV air combat under sparse rewards［J］. Journal of System Simulation， 2024，36（6）：1452-1467 （in Chinese）.
36	周文卿，朱纪洪，匡敏驰. 一种基于群体智能的无人空战系统［J］. 中国科学：信息科学， 2020， 50（3）： 363-374.
	ZHOU W Q， ZHU J H， KUANG M C. An unmanned air combat system based on swarm intelligence［J］. Scientia Sinica （Informationis）， 2020， 50（3）： 363-374 （in Chinese）.
37	FAN Z， SU R， ZHANG W N， et al. Hybrid actor-critic reinforcement learning in parameterized action space［DB/OL］. arXiv preprint： 1903.01344，2019.

Options

Outlines

模态框（Modal）标题

Abstract

Cite this article

References