ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Decision-making method for air combat maneuver based on explainable reinforcement learning
Received date: 2023-11-28
Revised date: 2024-01-10
Accepted date: 2024-04-07
Online published: 2024-04-12
Supported by
Collective Intelligence & Collaboration Laboratory(QXZ23013402)
Intelligent air combat is the trend of air combat in the future, and deep reinforcement learning is an important technical way to realize intelligent decision-making in air combat. However, due to the characteristic of “black box model”, deep reinforcement learning has the shortcomings such as difficulty in explaining strategies, understanding intentions, and trusting decisions, which brings challenges to the application of deep reinforcement learning in intelligent air combat. To solve these problems, an intelligent air combat maneuver decision-making method is proposed based on explainable reinforcement learning. Firstly, based on the strategy-level explanation method and dynamic Bayesian network, an interpretability model and the maneuvering intention recognition model are constructed. Secondly, through calculation of the importance of the decision and the probability of maneuvering intention, the intention-level of the Unmanned Aerial Vehicle (UAV) maneuver decision-making process can be explained. Finally, based on the intent interpretation results, the reward function and training strategy of the deep reinforcement learning algorithm are modified, and the effectiveness of the proposed method is verified by simulation and comparative analysis. The proposed method can obtain air combat maneuver strategies with excellent effectiveness, strong reliability, and high credibility.
Shuheng YANG , Dong ZHANG , Wei XIONG , Zhi REN , Shuo TANG . Decision-making method for air combat maneuver based on explainable reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(18) : 329922 -329922 . DOI: 10.7527/S1000-6893.2023.29922
1 | 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报, 2021, 42(8): 525799. |
SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(8): 525799 (in Chinese). | |
2 | GETZ W M, PACHTER M. Two-target pursuit-evasion differential games in the plane[J]. Journal of Optimization Theory and Applications, 1981, 34(3): 383-403. |
3 | GENG W X, KONG F E, MA D Q. Study on tactical decision of UAV medium-range air combat[C]∥ The 26th Chinese Control and Decision Conference. Piscataway: IEEE Press, 2014: 135-139. |
4 | VIRTANEN K, RAIVIO T, HAMALAINEN R P. Modeling pilot’s sequential maneuvering decisions by a multistage influence diagram[J]. Journal of Guidance, Control, and Dynamics, 2004, 27(4): 665-677. |
5 | LI B, LIANG S Y, TIAN L Y, et al. Intelligent aircraft maneuvering decision based on CNN[C]∥ Proceedings of the 3rd International Conference on Computer Science and Application Engineering. New York: ACM, 2019: 1–5. |
6 | 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731. |
ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(4): 126731 (in Chinese). | |
7 | 李文韬,方峰,王振亚,等.引入混合超网络改进MADDPG的双机编队空战自主机动决策[J/OL]. 航空学报,(2023-11-02)[2023-11-28]. . |
LI W T, FANG F, WANG Z Y, et al. Intelligent maneuvering decision-making in two-UCAVs cooperative air combat based on improved MADDPG with hybird hyper network[J/OL]. Acta Aeronautica et Astronautica Sinica,(2023-11-02)[2023-11-28].. | |
8 | 李曾琳, 李波, 白双霞, 等. 基于AM-SAC的无人机自主空战决策[J]. 兵工学报, 2023, 44(9): 2849-2858. |
LI Z L, LI B, BAI S X, et al. UAV autonomous air combat decision-making based on AM-SAC[J]. Acta Armamentarii, 2023, 44(9): 2849-2858 (in Chinese). | |
9 | 符小卫, 徐哲, 朱金冬, 等. 基于PER-MATD3的多无人机攻防对抗机动决策[J]. 航空学报, 2023, 44(7): 327083. |
FU X W, XU Z, ZHU J D, et al. Maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(7): 327083 (in Chinese) | |
10 | TOPIN N, MILANI S, FANG F, et al. Iterative bounding MDPs: Learning interpretable policies via non-interpretable methods[C]∥ Proceedings of the AAAI Conference on Artificial Intelligence. 2021. |
11 | SILVA A, GOMBOLAY M, KILLIAN T, et al. Optimization methods for interpretable differentiable decision trees applied to reinforcement learning[C]∥Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. 2020: 1855--1865. |
12 | LANDAJUELA M, PETERSEN B K, KIM S, et al. Discovering symbolic policies with deep reinforcement learning[C]∥ Proceedings of the 38th International Conference on Machine Learning. 2021: 5979--5989. |
13 | DANESH M H, KOUL A, FERN A, et al. Re-understanding finite-state representations of recurrent policy networks[C]∥Proceedings of the 38th International Conference on Machine Learning. 2021: 2388-2397. |
14 | GREYDANUS S, KOUL A, DODGE J, et al. Visualizing and understanding Atari agents[C]∥Proceedings of the 35th International Conference on Machine Learning. 2018: 1792-1801. |
15 | BASTANI O, PU Y W, SOLAR-LEZAMA A. Verifiable reinforcement learning via policy extraction[C]∥ Proceedings of the 32nd International Conference on Neural Information Processing Systems. New York: ACM, 2018: 2499-2509. |
16 | TJOA E, GUAN C T. A survey on eXplainable Artificial Intelligence (XAI): Toward medical XAI[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(11): 4793-4813. |
17 | TOPIN N, VELOSO M. Generation of policy-level explanations for reinforcement learning[C]∥ Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence and Thirty-First Innovative Applications of Artificial Intelligence Conference and Ninth AAAI Symposium on Educational Advances in Artificial Intelligence. 2019: 2514-2521. |
18 | 高阳阳, 余敏建, 韩其松, 等. 基于改进共生生物搜索算法的空战机动决策[J]. 北京航空航天大学学报, 2019, 45(3): 429-436. |
GAO Y Y, YU M J, HAN Q S, et al. Air combat maneuver decision-making based on improved symbiotic organisms search algorithm[J]. Journal of Beijing University of Aeronautics and Astronautics, 2019, 45(3): 429-436 (in Chinese). | |
19 | 杜海文, 崔明朗, 韩统, 等. 基于多目标优化与强化学习的空战机动决策[J]. 北京航空航天大学学报, 2018, 44(11): 2247-2256. |
DU H W, CUI M L, HAN T, et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning[J]. Journal of Beijing University of Aeronautics and Astronautics, 2018, 44(11): 2247-2256 (in Chinese). | |
20 | 李永丰, 史静平, 章卫国, 等. 深度强化学习的无人作战飞机空战机动决策[J]. 哈尔滨工业大学学报, 2021, 53(12): 33-41. |
LI Y F, SHI J P, ZHANG W G, et al. Maneuver decision of UCAV in air combat based on deep reinforcement learning[J]. Journal of Harbin Institute of Technology, 2021, 53(12): 33-41 (in Chinese). | |
21 | LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv: , 2015. |
22 | GUO W B, WU X, KHAN U, et al. EDGE: Explaining deep reinforcement learning policies[C]∥Advances in Neural Information Processing Systems. 2021: 12222-12236. |
23 | DUVENAUD D, NICKISCH H, RASMUSSEN C. Additive Gaussian processes[C]∥ Proceedings of the 24th International Conference on Neural Information Processing Systems. New York: ACM, 2011: 226-234. |
/
〈 |
|
〉 |