航空学报 > 2024, Vol. 45 Issue (18): 329922-329922   doi: 10.7527/S1000-6893.2023.29922

基于可解释性强化学习的空战机动决策方法

杨书恒1,2, 张栋1,2(), 熊威1,2, 任智1,2, 唐硕1,2   

  1. 1.西北工业大学 航天学院,西安 710072
    2.西北工业大学 陕西省空天飞行器设计重点实验室,西安 710072
  • 收稿日期:2023-11-28 修回日期:2024-01-10 接受日期:2024-04-07 出版日期:2024-04-12 发布日期:2024-04-12
  • 通讯作者: 张栋 E-mail:zhangdong@nwpu.edu.cn
  • 基金资助:
    群体协同与自主实验室开放基金(QXZ23013402)

Decision-making method for air combat maneuver based on explainable reinforcement learning

Shuheng YANG1,2, Dong ZHANG1,2(), Wei XIONG1,2, Zhi REN1,2, Shuo TANG1,2   

  1. 1.School of Astronautics,Northwestern Polytechnical University,Xi’an 710072,China
    2.Shaanxi Key Laboratory of Aerospace Flight Vehicle Design,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2023-11-28 Revised:2024-01-10 Accepted:2024-04-07 Online:2024-04-12 Published:2024-04-12
  • Contact: Dong ZHANG E-mail:zhangdong@nwpu.edu.cn
  • Supported by:
    Collective Intelligence & Collaboration Laboratory(QXZ23013402)

摘要:

智能空战是未来空战的趋势,深度强化学习是实现空战智能决策的一条重要技术途径。然而由于深度强化学习的“黑箱模型”特质,存在策略难解释、意图难理解、决策难信任的缺点,给深度强化学习在智能空战中的应用带来了挑战。针对这些问题,提出了一种基于可解释性强化学习的智能空战机动决策方法。首先,基于策略级解释方法和动态贝叶斯网络构建了可解释性模型和机动意图识别模型;其次,通过决策重要性的计算和机动意图概率实现了无人机机动决策过程的意图层面可解释;最后,基于意图解释结果对深度强化学习算法的奖励函数和训练策略进行修正,并通过仿真对比分析验证了所提设计方法的有效性。所提方法能够获得有效性优、可靠性强、可信度高的空战机动策略。

关键词: 智能空战, 强化学习, 机动动作决策, 可解释性, 空战意图识别

Abstract:

Intelligent air combat is the trend of air combat in the future, and deep reinforcement learning is an important technical way to realize intelligent decision-making in air combat. However, due to the characteristic of “black box model”, deep reinforcement learning has the shortcomings such as difficulty in explaining strategies, understanding intentions, and trusting decisions, which brings challenges to the application of deep reinforcement learning in intelligent air combat. To solve these problems, an intelligent air combat maneuver decision-making method is proposed based on explainable reinforcement learning. Firstly, based on the strategy-level explanation method and dynamic Bayesian network, an interpretability model and the maneuvering intention recognition model are constructed. Secondly, through calculation of the importance of the decision and the probability of maneuvering intention, the intention-level of the Unmanned Aerial Vehicle (UAV) maneuver decision-making process can be explained. Finally, based on the intent interpretation results, the reward function and training strategy of the deep reinforcement learning algorithm are modified, and the effectiveness of the proposed method is verified by simulation and comparative analysis. The proposed method can obtain air combat maneuver strategies with excellent effectiveness, strong reliability, and high credibility.

Key words: intelligent air combat, reinforcement learning, maneuver decision-making, explainability, identification of air combat intention

中图分类号: