Acta Aeronautica et Astronautica Sinica ›› 2024, Vol. 45 ›› Issue (17): 530053-530053.doi: 10.7527/S1000-6893.2024.30053
• Articles • Previous Articles
Zuolong LI1, Jihong ZHU1(), Minchi KUANG1, Jie ZHANG2, Jie REN2
Received:
2024-01-02
Revised:
2024-01-11
Accepted:
2024-04-22
Online:
2024-04-26
Published:
2024-04-25
Contact:
Jihong ZHU
E-mail:jhzhu@tsinghua.edu.cn
CLC Number:
Zuolong LI, Jihong ZHU, Minchi KUANG, Jie ZHANG, Jie REN. Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(17): 530053-530053.
1 | 喻煌超, 牛轶峰, 王祥科. 无人机系统发展阶段和智能化趋势[J]. 国防科技, 2021, 42(3): 18-24. |
YU H C, NIU Y F, WANG X K. Stages of development of Unmanned Aerial Vehicles[J]. National Defense Technology, 2021, 42(3): 18-24 (in Chinese). | |
2 | ERNEST N, CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J]. Journal of Defense Management, 2016, 6(1): 1000144. |
3 | POPE A P, IDE J S, MIĆOVIĆ D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥ 2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284. |
4 | DINARDO G. Artificial intelligence flies XQ-58A Valkyrie drone [EB/OL] (2023-08-03)[2023-12-15]. . |
5 | 赵志忠, 高正红, 刘行伟, 等. 用攻击点推移速率评估一对一超视距空战效能[J]. 系统仿真学报, 2005, 17(12): 2855-2857, 2862. |
ZHAO Z Z, GAO Z H, LIU X W, et al. Using shooting point stepping pace for evaluating one-versus-one BVR combat effectiveness[J]. Acta Simulata Systematica Sinica, 2005, 17(12): 2855-2857, 2862 (in Chinese). | |
6 | 杜海文, 崔明朗, 韩统, 等. 基于多目标优化与强化学习的空战机动决策[J]. 北京航空航天大学学报, 2018, 44(11): 2247-2256. |
DU H W, CUI M L, HAN T, et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2018, 44(11): 2247-2256 (in Chinese). | |
7 | AUSTIN F, CARBONE G, FALCO M, et al. Automated maneuvering decisions for air-to-air combat[C]∥ Proceedings of the Guidance, Navigation and Control Conference. Reston: AIAA, 1987:2393. |
8 | ISAACS R. Differential games: A mathematical theory with applications to warfare and pursuit, control and optimization[M]. Mineola: Dover Publications, 1999. |
9 | HUANG C Q, DONG K S, HUANG H Q, et al. Autonomous air combat maneuver decision using Bayesian inference and moving horizon optimization[J]. Journal of Systems Engineering and Electronics, 2018, 29(1): 86-97. |
10 | BURGIN G H, OWENS A J. An adaptive maneuvering logic computer program for the simulation of one-to-one air-to-air combat. Volume 2: Program description:NASA-CR-2583 [R]. Washington, D. C.:NASA, 1975. |
11 | SUN Z X, PIAO H Y, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence, 2021, 98: 104112. |
12 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533. |
13 | SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529: 484-489. |
14 | BERNER C, BROCKMAN G, CHAN B, et al. Dota2 with large scale deep reinforcement learning[DB/OL]. arXiv preprint: 1912.06680,2019. |
15 | 章胜, 周攀, 何扬, 等. 基于深度强化学习的空战机动决策试验[J]. 航空学报, 2023, 44(10): 128094. |
ZHANG S, ZHOU P, HE Y, et al. Air combat maneuver decision-making test based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(10): 128094 (in Chinese). | |
16 | 张建东, 王鼎涵, 杨啟明, 等. 基于分层强化学习的无人机空战多维决策[J]. 兵工学报, 2023, 44(6): 1547-1563. |
ZHANG J D, WANG D H, YANG Q M, et al. Multi-dimensional decision-making for UAV air combat based on hierarchical reinforcement learning[J]. Acta Armamentarii, 2023, 44(6): 1547-1563 (in Chinese). | |
17 | 邱妍, 赵宝奇, 邹杰, 等. 基于PPO算法的无人机近距空战自主引导方法[J]. 电光与控制, 2023, 30(1): 8-14. |
QIU Y, ZHAO B Q, ZOU J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm[J]. Electronics Optics & Control, 2023, 30(1): 8-14 (in Chinese). | |
18 | 钱殿伟, 齐红敏, 刘振, 等. 基于改进近端策略优化的空战自主决策研究[J/OL]. 系统仿真学报,(2023-07-20)[2024-01-01]. . |
QIAN D W, QI H M, LIU Z, et al. Research on autonomous decision-making in air-combat based on improved proximal policy optimization[J/OL]. Journal of System Simulation,(2023-07-20)[2024-01-01]. (in Chinese). | |
19 | BARTO A G. Reinforcement learning[M]∥OMIDVAR O, ELLIOTT D L. Neural Systems for Control. Amsterdam: Elsevier, 1997: 7-30. |
20 | SUTTON R S, MCALLESTER D, SINGH S, et al. Policy gradient methods for reinforcement learning with function approximation[C]∥ Proceedings of the 12th International Conference on Neural Information Processing Systems. New York: ACM, 1999: 1057–1063. |
21 | SCHULMAN J, LEVINE S, MORITZ P, et al. Trust region policy optimization[C]∥Proceedings of the 32nd International Conference on International Conference on Machine Learning - Volume 37. New York:ACM,2015:1889-1897. |
22 | SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[DB/OL]. arXiv preprint:1707.06347,2017. |
23 | HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[DB/OL]. arXiv preprint: 1812.05905,2018. |
24 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint :1509.02971, 2015. |
25 | FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]∥ Proceedings of the 35th International Conference on Machine Learning,2018: 1587-1596. |
26 | SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control using generalized advantage estimation[DB/OL]. arXiv preprint:1506.02438, 2015. |
27 | ENGSTROM L, ILYAS A, SANTURKAR S, et al. Implementation matters in deep policy gradients: A case study on PPO and TRPO[DB/OL]. arXiv preprint:2005.12729, 2020. |
28 | ZHU J Y, KUANG M C, ZHOU W Q, et al. Mastering air combat game with deep reinforcement learning[J]. Defence Technology, 2024, 34: 295-312. |
29 | 王宝来,高显忠,谢涛,等.基于强化学习与种群博弈的近距空战决策研究[J/OL].航空学报, (2023-11-02)[2024-01-01]. . |
WANG B L, GAO X Z, XIE T, et al. Research on decision-making in close-range air combat based on reinforcement learning and population game[J/OL]. Acta Aeronautica et Astronautica Sinica,(2023-11-02)[2024-01-01]. (in Chinese). | |
30 | 张婷玉, 孙明玮, 王永帅, 等. 基于深度Q网络的近距空战智能机动决策研究[J]. 航空兵器, 2023, 30(3): 41-48. |
ZHANG T Y, SUN M W, WANG Y S, et al. Research on intelligent maneuvering decision-making in close air combat based on deep Q network[J]. Aero Weaponry, 2023, 30(3): 41-48 (in Chinese). | |
31 | ZHANG H P, WEI Y J, ZHOU H, et al. Maneuver decision-making for autonomous air combat based on FRE-PPO[J]. Applied Sciences, 2022, 12(20): 10230. |
32 | 杨晟琦, 田明俊, 司迎利, 等. 基于分层强化学习的无人机机动决策[J]. 火力与指挥控制, 2023, 48(8): 48-52, 59. |
YANG S Q, TIAN M J, SI Y L, et al. Research on UAV maneuver decision-making based on hierarchical reinforcement learning[J]. Fire Control & Command Control, 2023, 48(8): 48-52, 59 (in Chinese). | |
33 | 钟友武, 柳嘉润, 杨凌宇, 等. 自主近距空战中机动动作库及其综合控制系统[J]. 航空学报, 2008, 29(S1): 114-121. |
ZHONG Y W, LIU J R, YANG L Y, et al. Maneuver library and integrated control system for autonomous close-in air combat [J]. Acta Aeronautica et Astronautica Sinica, 2008, 29(S1): 114-121 (in Chinese). | |
34 | NG A Y, HARADA D, RUSSELL S J. Policy invariance under reward transformations: theory and application to reward shaping[C]∥ Proceedings of the Sixteenth International Conference on Machine Learning. New York: ACM, 1999:278-287. |
35 | 祝靖宇, 张宏立, 匡敏驰, 等.稀疏奖励下基于课程学习的无人机空战仿真[J].系统仿真学报,2024,36(6):1452-1467. |
ZHU J Y, ZHANG H L, KUANG M C, et al. Curriculum learning based simulation of UAV air combat under sparse rewards[J]. Journal of System Simulation, 2024,36(6):1452-1467 (in Chinese). | |
36 | 周文卿, 朱纪洪, 匡敏驰. 一种基于群体智能的无人空战系统[J]. 中国科学: 信息科学, 2020, 50(3): 363-374. |
ZHOU W Q, ZHU J H, KUANG M C. An unmanned air combat system based on swarm intelligence[J]. Scientia Sinica (Informationis), 2020, 50(3): 363-374 (in Chinese). | |
37 | FAN Z, SU R, ZHANG W N, et al. Hybrid actor-critic reinforcement learning in parameterized action space[DB/OL]. arXiv preprint: 1903.01344,2019. |
[1] | Honglin ZHANG, Jianjun LUO, Weihua MA. Spacecraft game decision making for threat avoidance of space targets based on machine learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329136-329136. |
[2] | Yunpeng CAI, Dapeng ZHOU, Jiangchuan DING. Intelligent collaborative control of UAV swarms with collision avoidance safety constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529683-529683. |
[3] | Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723-328723. |
[4] | Sai ZHANG, Zhen YANG, Xiangnan DU, Yazhong LUO. Threat avoidance strategy of spacecraft maneuvering approach based on orbital reachable domain [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328778-328778. |
[5] | Wentao LI, Feng FANG, Zhenya WANG, Yichao ZHU, Dongliang PENG. Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(17): 529460-529460. |
[6] | Tiancai WU, Honglun WANG, Bin REN, Yiheng LIU, Xingyu WU, Guocheng YAN. Learning-based integrated fault-tolerant guidance and control for hypersonic vehicles considering avoidance and penetration [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(15): 329607-329607. |
[7] | Baolai WANG, Xianzhong GAO, Tao XIE, Zhongxi HOU. Decision⁃making in close⁃range air combat based on reinforcement learning and population game [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(12): 329446-329446. |
[8] | Ming LU, Xueqin CHEN, Fan WU, Xibin CAO. Attitude maneuver control of spacecraft based on second⁃order fully actuated system under attitude constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(1): 628958-628958. |
[9] | Xuejian WANG, Yongming WEN, Xiaorong SHI, Ningning ZHANG, Jiexi LIU. Design of hybrid intelligent decision framework for multi⁃agent and multi⁃coupling tasks [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729770-729770. |
[10] | Xiaowei FU, Zhe XU, Jindong ZHU, Nan WANG. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3 [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(7): 327083-327083. |
[11] | Xizhen GAO, Liang TANG, Huang HUANG. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(6): 26762-026762. |
[12] | Xiaoyu LIU, Liguo SUN, Wenqian TAN, Jinpeng WEI, Weijun WANG, Junkai JIAO. Modeling and evaluation of carrier aircraft pilots based on similar configuration decisions [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126329-126329. |
[13] | Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731. |
[14] | Yupeng FU, Xiangyang DENG, Ziqiang ZHU, Limin ZHANG. Value-filter based air-combat maneuvering optimization [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 628871-628871. |
[15] | Yiting TAN, Wuxing JING, Changsheng GAO, Ruoming AN. Multiple constrained analytical capture region for hypersonic maneuvering target interception [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 328436-328436. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||
Address: No.238, Baiyan Buiding, Beisihuan Zhonglu Road, Haidian District, Beijing, China
Postal code : 100083
E-mail:hkxb@buaa.edu.cn
Total visits: 6658907 Today visits: 1341All copyright © editorial office of Chinese Journal of Aeronautics
All copyright © editorial office of Chinese Journal of Aeronautics
Total visits: 6658907 Today visits: 1341