基于可解释性强化学习的空战机动决策方法

  • 杨书恒 ,
  • 张栋 ,
  • 熊威 ,
  • 任智 ,
  • 唐硕
展开
  • 1. 西北工业大学
    2. 西北工业大学 空天飞行器设计陕西省重点实验室

收稿日期: 2023-11-28

  修回日期: 2024-04-08

  网络出版日期: 2024-04-10

基金资助

群体协同与自主实验室开放基金

A Decision-making Method for Air Combat Maneuver Based on Explainable Reinforcement Learning

  • YANG Shu-Heng ,
  • ZHANG Dong ,
  • XIONG Wei ,
  • REN Zhi ,
  • TANG Shuo
Expand

Received date: 2023-11-28

  Revised date: 2024-04-08

  Online published: 2024-04-10

Supported by

Funded by Collective Intelligence & Collaboration Laboratory

摘要

智能空战是未来空战的趋势,深度强化学习是实现空战智能决策的一条重要技术途径。然而由于深度强化学习的“黑箱模型”特质,存在策略难解释,意图难理解,决策难信任的缺点,给深度强化学习在智能空战中的应用带来了挑战。针对这些问题,提出了一种基于可解释性强化学习的智能空战机动动作决策方法。首先,基于策略级解释方法和动态贝叶斯网络构建了可解释性模型和机动意图识别模型;其次,通过决策重要性的计算以及机动意图概率实现无人机机动决策过程的意图层面可解释;最后,基于意图解释结果对深度强化学习算法的奖励函数和训练策略进行修正,并通过仿真对比分析验证了该设计方法的有效性。该方法能够获得有效性优、可靠性强、可信度高的空战机动策略。

本文引用格式

杨书恒 , 张栋 , 熊威 , 任智 , 唐硕 . 基于可解释性强化学习的空战机动决策方法[J]. 航空学报, 0 : 0 -0 . DOI: 10.7527/S1000-6893.2024.29922

Abstract

Intelligent air combat is the trend of air combat in the future, and deep reinforcement learning is an important technical way to realize intelligent decision-making in air combat. However, due to the "black box model" characteristics of deep reinforcement learning, there are shortcomings such as difficult to explain strategies, difficult to understand intentions, and difficult to trust decisions, which brings challenges to the application of deep reinforcement learning in intelligent air combat. To solve these problems, an intelligent air combat maneuver action decision-making method based on explainable reinforcement learning was proposed. Firstly, based on the strategy-level explainable method and dynamic Bayesian network, the interpretability model and the maneuvering intention recogni-tion model were constructed. Secondly, through the calculation of the importance of the decision and the probability of maneuvering intention, the intention level of the UAV maneuvering decision-making process can be explained. Finally, based on the intent inter-pretation results, the reward function and training strategy of the deep reinforcement learning algorithm are modified, and the effec-tiveness of the design method is verified by simulation and comparative analysis. This method can obtain air combat maneuver strat-egies with excellent effectiveness, strong reliability, and high credibility.

参考文献

[1] 孙智孝, 杨晟琦, 朴海音, et al. 未来智能空战发展综述[J]. 航空学报, 2021, 42(08): 35-49. [2] Getz W M, Pachter M. Two-target pursuit-evasion differen-tial games in the plane[J]. Journal of Optimization Theo-ry and Applications, 1981, 34(3): 383-403. [3] Geng W X, Kong F, Ma D Q. Study on tactical decision of UAV medium-range air combat[C]//The 26th Chinese Control and Decision Conference (2014 CCDC), 2014: 135-139. [4] Virtanen K, Raivio T, Hamalainen R P. Modeling Pilot's Sequential Maneuvering Decisions by a Multistage In-fluence Diagram[J]. Journal of Guidance, Control, and Dynamics, 2004, 27(4): 665-677. [5] Li B, Liang S, Tian L, et al. Intelligent Aircraft Maneuver-ing Decision Based on CNN[C]//Proceedings of the 3rd International Conference on Computer Science and Ap-plication Engineering, 2019: Article 138. [6] 周攀, 黄江涛, 章胜, et al. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(04): 99-112. [7] 李文韬, 方峰, 王振亚, et al. 引入混合超网络改进MADDPG的双机编队空战自主机动决策-2024高性能无人机专刊[J]. 航空学报, DOI:10.7527/S1000-6893.2023.29460. [8] 李曾琳, 李波, 白双霞, et al. 基于AM-SAC的无人机自主空战决策[J]. 兵工学报, 2023, 44(09): 2849-2858. [9] 符小卫, 徐哲, 朱金冬, et al. 基于PER-MATD3的多无人机攻防对抗机动决策[J]. 航空学报, 2023, 44(07): 196-209. [10] Topin N, Milani S, Fang F, et al. Iterative Bounding MDPs: Learning Interpretable Policies via Non-Interpretable Methods[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2021: 9923-9931 [11] Silva A, Gombolay M, Killian T, et al. Optimization Methods for Interpretable Differentiable Decision Trees Applied to Reinforcement Learning[C]//Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, 2020: 1855-1865. [12] Landajuela M, Petersen B K, Kim S, et al. Discovering symbolic policies with deep reinforcement learn-ing[C]//Proceedings of the 38th International Conference on Machine Learning, 2021: 5979-5989. [13] Danesh M H, Koul A, Fern A, et al. Re-understanding Finite-State Representations of Recurrent Policy Net-works[C]//Proceedings of the 38th International Confer-ence on Machine Learning, 2021: 2388-2397. [14] Greydanus S, Koul A, Dodge J, et al. Visualizing and Understanding Atari Agents[C]//Proceedings of the 35th International Conference on Machine Learning, 2018: 1792-1801. [15] Bastani O, Pu Y, Solar-Lezama A. Verifiable Reinforce-ment Learning via Policy Extraction[C]//Advances in Neural Information Processing Systems, 2018: 2499-2509. [16] Tjoa E, Guan C. A Survey on Explainable Artificial Intel-ligence (XAI): Toward Medical XAI[J]. IEEE Transac-tions on Neural Networks and Learning Systems, 2021, 32(11): 4793-4813. [17] Topin N, Veloso M. Generation of Policy-Level Explana-tions for Reinforcement Learning[DB/OL]. CoRR: abs/1905.12044, 2019. [18] 高阳阳, 余敏建, 韩其松, et al. 基于改进共生生物搜索算法的空战机动决策[J]. 北京航空航天大学学报, 2019, 45(03): 429-436. [19] 杜海文, 崔明朗, 韩统, et al. 基于多目标优化与强化学习的空战机动决策[J]. 北京航空航天大学学报, 2018, 44(11): 2247-2256. [20] 李永丰, 史静平, 章卫国, et al. 深度强化学习的无人作战飞机空战机动决策[J]. 哈尔滨工业大学学报, 2021, 53(12): 33-41. [21] Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning[DB/OL]. CoRR: abs/1509.02971, 2015. [22] Guo W, Wu X, Khan U, et al. EDGE: Explaining Deep Reinforcement Learning Policies[C]//Advances in Neural Information Processing Systems, 2021: 12222-12236.
文章导航

/