航空学报 > 2026, Vol. 47 Issue (7): 332547-332547   doi: 10.7527/S1000-6893.2025.32547

面向智能空战有人/无人机协同可解释方法

熊威1, 张栋1(), 杨书恒1, 任智1, 刘文逸2   

  1. 1. 西北工业大学 航天学院,西安 710072
    2. 西北机电工程研究所,咸阳 712099
  • 收稿日期:2025-07-10 修回日期:2025-08-11 接受日期:2025-11-10 出版日期:2025-11-26 发布日期:2025-11-25
  • 通讯作者: 张栋
  • 基金资助:
    国家自然科学基金(52472417)

Manned/unmanned aerial vehicle collaborative interpretable method for intelligent air combat

Wei XIONG1, Dong ZHANG1(), Shuheng YANG1, Zhi REN1, Wenyi LIU2   

  1. 1. School of Astronautics,Northwestern Polytechnical University,Xi’an 710072,China
    2. Northwest Institute of Mechanical & Electrical Engineering,Xianyang 712099,China
  • Received:2025-07-10 Revised:2025-08-11 Accepted:2025-11-10 Online:2025-11-26 Published:2025-11-25
  • Contact: Dong ZHANG
  • Supported by:
    National Natural Science Foundation of China(52472417)

摘要:

有人/无人机(M/UAV)协同是未来空战的重要作战形式,其中深度强化学习是实现有人/无人机协同空战的关键技术。然而深度强化学习的“黑箱特性”,使得学习到的策略难理解、难信任,因此具备可解释性的深度强化学习是实现有人/无人机协同智能空战的关键。提出了一种基于Bayesian-Shapley框架的深度强化学习解释方法,实现了决策过程的可解释性建模与验证分析,达到解释无人机决策依据的目标。该方法首先基于动态贝叶斯网络构建了协同任务的决策意图解析框架,能够定位航迹切片中的决策关键节点;其次采用Shapley贡献度评估算法,实现了对关键节点决策依据的状态级量化分析;最后通过重构深度强化学习模型的状态输入空间,在保持原有策略性能的同时显著提升了模型的可解释性和可信度,并通过状态空间消融仿真验证了解释结果的有效性。

关键词: 人机协同, 强化学习, 可解释性, 智能空战, 意图识别

Abstract:

Manned/Unmanned Aerial Vehicle (M/UAV) teaming represents a critical operational paradigm for future air combat, where deep reinforcement learning serves as a key enabling technology. However, the “black-box nature” of deep reinforcement learning renders the learned strategies difficult to interpret and trust, making interpretable deep reinforcement learning essential for achieving intelligent air combat collaboration. This paper proposes a deep reinforcement learning interpretation method based on the Bayesian Shapley framework, realizes the interpretability modeling and verification analysis of the decision-making process, and achieves the goal of explaining the decision-making basis of UAV. The proposed approach first constructs a decision intent analysis framework for cooperative missions using dynamic Bayesian networks, capable of identifying critical decision nodes in trajectory segments. Subsequently, the Shapley value-based contribution assessment algorithm is employed to achieve state-level quantitative analysis of decision rationale at key nodes. Finally, by reconstructing the state input space of the deep reinforcement learning model, the method significantly enhances model interpretability and trustworthiness while maintaining original policy performance, with the effectiveness of the explanatory results validated through state space ablation simulations.

Key words: human machine collaboration, deep reinforcement learning, interpretability, intelligent air combat, intention identification

中图分类号: