基于混合动作的空战分层强化学习决策算法
收稿日期: 2024-01-02
修回日期: 2024-01-11
录用日期: 2024-04-22
网络出版日期: 2024-04-25
Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning
Received date: 2024-01-02
Revised date: 2024-01-11
Accepted date: 2024-04-22
Online published: 2024-04-25
智能空战是世界主要军事强国的研究热点。为解决超视距空战博弈机动决策问题,提出了基于深度强化学习的超视距空战分层决策算法。在该决策算法中,使用适合于超视距空战的机动动作集,对飞机的航迹和姿态进行控制。为了扩大模型的动作空间,提升模型的决策能力,将空战的动作空间进行分层,建模为多维离散的动作空间。针对空战中稀疏奖励的问题,设计了一套综合考虑位置优势、武器发射和武器威胁等要素的奖励函数,用于引导智能体向最优策略收敛。搭建了完整的数字孪生空战仿真环境和空战专家系统,在仿真环境中训练决策算法,并通过与专家系统的对抗,对决策算法进行评估。实验结果表明:决策算法具备超视距空战自主决策的能力,能够根据战场态势,进行灵活的机动决策,在与专家系统对抗的过程中具有一定的优势。
李佐龙 , 朱纪洪 , 匡敏驰 , 张杰 , 任洁 . 基于混合动作的空战分层强化学习决策算法[J]. 航空学报, 2024 , 45(17) : 530053 -530053 . DOI: 10.7527/S1000-6893.2024.30053
Intelligent air combat is a hot research topic among countries with strong military power in the world. To solve the maneuver decision problem of air combat Beyond Visual Range (BVR), we propose the hierarchical decision algorithm based on deep reinforcement learning. In the decision algorithm, we use the maneuver set appropriate to the BVR air combat to control the trajectory and the attitude of the aircraft. To expand the action space of the model and increase its decision-making ability, we hierarchize the action space and model it as the multi-discrete one. To solve the problem of sparse reward in air combat, we design a set of reward function taking into consideration the factors including the position advantage, weapon launching, and weapon threat, which can guide the agent to converge to the optimal policy. We also build a complete digital-twin simulation environment for air combat and an expert system. The decision algorithm is trained in the simulation environment, and is evaluated by fighting with the expert system. The experiment results indicate that the decision algorithm proposed has the ability to make autonomous and flexible decisions in BVR air combat based on current situations, and has some advantages against the expert system.
1 | 喻煌超, 牛轶峰, 王祥科. 无人机系统发展阶段和智能化趋势[J]. 国防科技, 2021, 42(3): 18-24. |
YU H C, NIU Y F, WANG X K. Stages of development of Unmanned Aerial Vehicles[J]. National Defense Technology, 2021, 42(3): 18-24 (in Chinese). | |
2 | ERNEST N, CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J]. Journal of Defense Management, 2016, 6(1): 1000144. |
3 | POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥ 2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284. |
4 | DINARDO G. Artificial intelligence flies XQ-58A Valkyrie drone [EB/OL] (2023-08-03)[2023-12-15]. . |
5 | 赵志忠, 高正红, 刘行伟, 等. 用攻击点推移速率评估一对一超视距空战效能[J]. 系统仿真学报, 2005, 17(12): 2855-2857, 2862. |
ZHAO Z Z, GAO Z H, LIU X W, et al. Using shooting point stepping pace for evaluating one-versus-one BVR combat effectiveness[J]. Acta Simulata Systematica Sinica, 2005, 17(12): 2855-2857, 2862 (in Chinese). | |
6 | 杜海文, 崔明朗, 韩统, 等. 基于多目标优化与强化学习的空战机动决策[J]. 北京航空航天大学学报, 2018, 44(11): 2247-2256. |
DU H W, CUI M L, HAN T, et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2018, 44(11): 2247-2256 (in Chinese). | |
7 | AUSTIN F, CARBONE G, FALCO M, et al. Automated maneuvering decisions for air-to-air combat[C]∥ Proceedings of the Guidance, Navigation and Control Conference. Reston: AIAA, 1987:2393. |
8 | ISAACS R. Differential games: A mathematical theory with applications to warfare and pursuit, control and optimization[M]. Mineola: Dover Publications, 1999. |
9 | HUANG C Q, DONG K S, HUANG H Q, et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization[J]. Journal of Systems Engineering and Electronics, 2018, 29(1): 86-97. |
10 | BURGIN G H, OWENS A J. An adaptive maneuvering logic computer program for the simulation of one-to-one air-to-air combat. Volume 2: Program description:NASA-CR-2583 [R]. Washington, D. C.:NASA, 1975. |
11 | SUN Z X, PIAO H Y, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence, 2021, 98: 104112. |
12 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533. |
13 | SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529: 484-489. |
14 | BERNER C, BROCKMAN G, CHAN B, et al. Dota2 with large scale deep reinforcement learning[DB/OL]. arXiv preprint: 1912.06680,2019. |
15 | 章胜, 周攀, 何扬, 等. 基于深度强化学习的空战机动决策试验[J]. 航空学报, 2023, 44(10): 128094. |
ZHANG S, ZHOU P, HE Y, et al. Air combat maneuver decision-making test based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(10): 128094 (in Chinese). | |
16 | 张建东, 王鼎涵, 杨啟明, 等. 基于分层强化学习的无人机空战多维决策[J]. 兵工学报, 2023, 44(6): 1547-1563. |
ZHANG J D, WANG D H, YANG Q M, et al. Multi-dimensional decision-making for UAV air combat based on hierarchical reinforcement learning[J]. Acta Armamentarii, 2023, 44(6): 1547-1563 (in Chinese). | |
17 | 邱妍, 赵宝奇, 邹杰, 等. 基于PPO算法的无人机近距空战自主引导方法[J]. 电光与控制, 2023, 30(1): 8-14. |
QIU Y, ZHAO B Q, ZOU J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm[J]. Electronics Optics & Control, 2023, 30(1): 8-14 (in Chinese). | |
18 | 钱殿伟, 齐红敏, 刘振, 等. 基于改进近端策略优化的空战自主决策研究[J/OL]. 系统仿真学报,(2023-07-20)[2024-01-01]. . |
QIAN D W, QI H M, LIU Z, et al. Research on autonomous decision-making in air-combat based on improved proximal policy optimization[J/OL]. Journal of System Simulation,(2023-07-20)[2024-01-01]. (in Chinese). | |
19 | BARTO A G. Reinforcement learning[M]∥OMIDVAR O, ELLIOTT D L. Neural Systems for Control. Amsterdam: Elsevier, 1997: 7-30. |
20 | SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥ Proceedings of the 12th International Conference on Neural Information Processing Systems. New York: ACM, 1999: 1057–1063. |
21 | SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]∥Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. New York:ACM,2015:1889-1897. |
22 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint:1707.06347,2017. |
23 | HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[DB/OL]. arXiv preprint: 1812.05905,2018. |
24 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint :1509.02971, 2015. |
25 | FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]∥ Proceedings of the 35th International Conference on Machine Learning,2018: 1587-1596. |
26 | SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint:1506.02438, 2015. |
27 | ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: A case study on PPO and TRPO[DB/OL]. arXiv preprint:2005.12729, 2020. |
28 | ZHU J Y, KUANG M C, ZHOU W Q, et al. Mastering air combat game with deep reinforcement learning[J]. Defence Technology, 2024, 34: 295-312. |
29 | 王宝来,高显忠,谢涛,等.基于强化学习与种群博弈的近距空战决策研究[J/OL].航空学报, (2023-11-02)[2024-01-01]. . |
WANG B L, GAO X Z, XIE T, et al. Research on decision-making in close-range air combat based on reinforcement learning and population game[J/OL]. Acta Aeronautica et Astronautica Sinica,(2023-11-02)[2024-01-01]. (in Chinese). | |
30 | 张婷玉, 孙明玮, 王永帅, 等. 基于深度Q网络的近距空战智能机动决策研究[J]. 航空兵器, 2023, 30(3): 41-48. |
ZHANG T Y, SUN M W, WANG Y S, et al. Research on intelligent maneuvering decision-making in close air combat based on deep Q network[J]. Aero Weaponry, 2023, 30(3): 41-48 (in Chinese). | |
31 | ZHANG H P, WEI Y J, ZHOU H, et al. Maneuver decision-making for autonomous air combat based on FRE-PPO[J]. Applied Sciences, 2022, 12(20): 10230. |
32 | 杨晟琦, 田明俊, 司迎利, 等. 基于分层强化学习的无人机机动决策[J]. 火力与指挥控制, 2023, 48(8): 48-52, 59. |
YANG S Q, TIAN M J, SI Y L, et al. Research on UAV maneuver decision-making based on hierarchical reinforcement learning[J]. Fire Control & Command Control, 2023, 48(8): 48-52, 59 (in Chinese). | |
33 | 钟友武, 柳嘉润, 杨凌宇, 等. 自主近距空战中机动动作库及其综合控制系统[J]. 航空学报, 2008, 29(S1): 114-121. |
ZHONG Y W, LIU J R, YANG L Y, et al. Maneuver library and integrated control system for autonomous close-in air combat [J]. Acta Aeronautica et Astronautica Sinica, 2008, 29(S1): 114-121 (in Chinese). | |
34 | NG A Y, HARADA D, RUSSELL S J. Policy invariance under reward transformations: theory and application to reward shaping[C]∥ Proceedings of the Sixteenth International Conference on Machine Learning. New York: ACM, 1999:278-287. |
35 | 祝靖宇, 张宏立, 匡敏驰, 等.稀疏奖励下基于课程学习的无人机空战仿真[J].系统仿真学报,2024,36(6):1452-1467. |
ZHU J Y, ZHANG H L, KUANG M C, et al. Curriculum learning based simulation of UAV air combat under sparse rewards[J]. Journal of System Simulation, 2024,36(6):1452-1467 (in Chinese). | |
36 | 周文卿, 朱纪洪, 匡敏驰. 一种基于群体智能的无人空战系统[J]. 中国科学: 信息科学, 2020, 50(3): 363-374. |
ZHOU W Q, ZHU J H, KUANG M C. An unmanned air combat system based on swarm intelligence[J]. Scientia Sinica (Informationis), 2020, 50(3): 363-374 (in Chinese). | |
37 | FAN Z, SU R, ZHANG W N, et al. Hybrid actor-critic reinforcement learning in parameterized action space[DB/OL]. arXiv preprint: 1903.01344,2019. |
/
〈 |
|
〉 |