面向智能空战有人/无人机协同可解释方法

doi:10.7527/S1000-6893.2025.32547

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 |

面向智能空战有人/无人机协同可解释方法

熊威¹, 张栋¹(), 杨书恒¹, 任智¹, 刘文逸²

^1. 西北工业大学航天学院，西安 710072
^2. 西北机电工程研究所，咸阳 712099

收稿日期:2025-07-10 修回日期:2025-08-11 接受日期:2025-11-10 出版日期:2025-11-26 发布日期:2025-11-25
通讯作者: 张栋
基金资助:
国家自然科学基金(52472417)

Manned/unmanned aerial vehicle collaborative interpretable method for intelligent air combat

Wei XIONG¹, Dong ZHANG¹(), Shuheng YANG¹, Zhi REN¹, Wenyi LIU²

^1. School of Astronautics，Northwestern Polytechnical University，Xi’an 710072，China
^2. Northwest Institute of Mechanical & Electrical Engineering，Xianyang 712099，China

Received:2025-07-10 Revised:2025-08-11 Accepted:2025-11-10 Online:2025-11-26 Published:2025-11-25
Contact: Dong ZHANG
Supported by:
National Natural Science Foundation of China(52472417)

摘要/Abstract

摘要：

有人/无人机（M/UAV）协同是未来空战的重要作战形式，其中深度强化学习是实现有人/无人机协同空战的关键技术。然而深度强化学习的“黑箱特性”，使得学习到的策略难理解、难信任，因此具备可解释性的深度强化学习是实现有人/无人机协同智能空战的关键。提出了一种基于Bayesian-Shapley框架的深度强化学习解释方法，实现了决策过程的可解释性建模与验证分析，达到解释无人机决策依据的目标。该方法首先基于动态贝叶斯网络构建了协同任务的决策意图解析框架，能够定位航迹切片中的决策关键节点；其次采用Shapley贡献度评估算法，实现了对关键节点决策依据的状态级量化分析；最后通过重构深度强化学习模型的状态输入空间，在保持原有策略性能的同时显著提升了模型的可解释性和可信度，并通过状态空间消融仿真验证了解释结果的有效性。

关键词: 人机协同, 强化学习, 可解释性, 智能空战, 意图识别

Abstract:

Manned/Unmanned Aerial Vehicle （M/UAV） teaming represents a critical operational paradigm for future air combat， where deep reinforcement learning serves as a key enabling technology. However， the “black-box nature” of deep reinforcement learning renders the learned strategies difficult to interpret and trust， making interpretable deep reinforcement learning essential for achieving intelligent air combat collaboration. This paper proposes a deep reinforcement learning interpretation method based on the Bayesian Shapley framework， realizes the interpretability modeling and verification analysis of the decision-making process， and achieves the goal of explaining the decision-making basis of UAV. The proposed approach first constructs a decision intent analysis framework for cooperative missions using dynamic Bayesian networks， capable of identifying critical decision nodes in trajectory segments. Subsequently， the Shapley value-based contribution assessment algorithm is employed to achieve state-level quantitative analysis of decision rationale at key nodes. Finally， by reconstructing the state input space of the deep reinforcement learning model， the method significantly enhances model interpretability and trustworthiness while maintaining original policy performance， with the effectiveness of the explanatory results validated through state space ablation simulations.

Key words: human machine collaboration, deep reinforcement learning, interpretability, intelligent air combat, intention identification

中图分类号:

V271.4
TP181

熊威, 张栋, 杨书恒, 任智, 刘文逸. 面向智能空战有人/无人机协同可解释方法[J]. 航空学报, 2026, 47(7): 332547.

Wei XIONG, Dong ZHANG, Shuheng YANG, Zhi REN, Wenyi LIU. Manned/unmanned aerial vehicle collaborative interpretable method for intelligent air combat[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(7): 332547.

图/表 17

图 1

图 2

图 3

图 4

表1

表2

红方有人机参数

参数	数值	参数	数值
$n x$	$- 1,5$	$n y$	$- 1.5,2$
$n z$	$- 3,3$	$v / (m ⋅ s - 1)$	$250,400$
$h / k m$	$[1,20]$	$φ R / r a d$	$π / 6$
$d M / k m$	$1,100$	$d R / k m$	$0,50$

表2

表3

红方无人机参数

参数	数值	参数	数值
$n x$	$- 1,5$	$n y$	$- 1.5,1.5$
$n z$	$- 2,2$	$v / (m ⋅ s - 1)$	$150,300$
$h / k m$	$[1,20]$	$φ R / r a d$	$π / 6$
$d M / k m$	$1,10$	$d R / k m$	$0,20$

表3

表4

蓝方有人机参数

参数	数值	参数	数值
$n x$	$- 1,5$	$n y$	$- 1.5,3$
$n z$	$- 3,3$	$v / (m ⋅ s - 1)$	$250,400$
$h / k m$	$[1,20]$	$φ R / r a d$	$π / 6$
$d M / k m$	$1,100$	$d R / k m$	$0,55$

表4

表5

超参数设置

参数	数值
Actor学习率	$1 × 10 - 5$
Critic学习率	$1 × 10 - 5$
软更新因子	$0.01$
学习衰减率	$0.98$
经验回放池容量	$10 - 6$
抽取样本数	$256$
学习间隔步长	$10$
训练最大局数	$2 × 104$

表5

表6

训练场景参数设置

参数	数值
$x / k m$	$[- 100,100]$
$y / k m$	$[- 100,100]$
无人机数量	2
$d s a f e / m$	100
蓝方初始位置 $/ k m$	（0，0，5）
蓝方初始航向/（°）	$[- 30,30]$
有人机初始位置 $/ k m$	（100，0，5）
有人机初始航向/（°）	［150，210］
无人机1初始位置 $/ k m$	（60，30，5）
无人机1初始航向/（°）	［150，210］
无人机2初始位置 $/ k m$	$(60, - 30,5)$
无人机2初始航向/（°）	［150，210］

表6

图 5

图 6

图 7

图 8

图 9

图 10

图 11

参考文献 29

[1]	王童豪，彭星光，胡浩，等. 海上有人/无人协同系统及其关键技术综述［J］. 兵工学报， 2024， 45（10）： 3317-3340.
	WANG T H， PENG X G， HU H， et al. Maritime manned/unmanned collaborative systems and key technologies： A survey［J］. Acta Armamentarii， 2024， 45（10）： 3317-3340 （in Chinese）.
[2]	UNITED STATES DEPARTMENT OF DEFENSE. Unmanned systems integrated roadmap： FY2013-2038［R］. Washington， D.C.： United States Department of Defense， 2013.
[3]	LI S Y， CHEN M， WANG Y H， et al. A fast algorithm to solve large-scale matrix games based on dimensionality reduction and its application in multiple unmanned combat air vehicles attack-defense decision-making［J］. Information Sciences， 2022， 594： 305-321.
[4]	RUAN W Y， DUAN H B， DENG Y M. Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements［J］. IEEE/CAA Journal of Automatica Sinica， 2022， 9（9）： 1639-1657.
[5]	ASLAN S， ERKIN T. A multi-population immune plasma algorithm for path planning of unmanned combat aerial vehicle［J］. Advanced Engineering Informatics， 2023， 55： 101829.
[6]	FU Y F， LIU D， CHEN J D， et al. Secretary bird optimization algorithm： A new metaheuristic for solving global optimization problems［J］. Artificial Intelligence Review， 2024， 57（5）： 123.
[7]	PIAO H Y， HAN Y， CHEN H C， et al. Complex relationship graph abstraction for autonomous air combat collaboration： A learning and expert knowledge hybrid approach［J］. Expert Systems with Applications， 2023， 215： 119285.
[8]	LI B， HUANG J Y， BAI S X， et al. Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning［J］. CAAI Transactions on Intelligence Technology， 2023， 8（1）： 64-81.
[9]	KAUFMANN E， BAUERSFELD L， LOQUERCIO A， et al. Champion-level drone racing using deep reinforcement learning［J］. Nature， 2023， 620（7976）： 982-987.
[10]	CHAI J J， CHEN W Z， ZHU Y H， et al. A hierarchical deep reinforcement learning framework for 6-DOF UCAV air-to-air combat［J］. IEEE Transactions on Systems， Man， and Cybernetics： Systems， 2023， 53（9）： 5417-5429.
[11]	LI B， BAI S X， LIANG S Y， et al. Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm［J］. CAAI Transactions on Intelligence Technology， 2023， 8（4）： 1608-1619.
[12]	李佐龙，朱纪洪，匡敏驰，等. 基于混合动作的空战分层强化学习决策算法［J］. 航空学报， 2024， 45（17）： 530053.
	LI Z L， ZHU J H， KUANG M C， et al. Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（17）： 530053 （in Chinese）.
[13]	WANG E S， LIU F， HONG C， et al. MADRL-based UAV swarm non-cooperative game under incomplete information［J］. Chinese Journal of Aeronautics， 2024， 37（6）： 293-306.
[14]	李樾，韩维，陈清阳，等. 凸优化算法在有人/无人机协同系统航迹规划中的应用［J］. 宇航学报， 2020， 41（3）： 276-286.
	LI Y， HAN W， CHEN Q Y， et al. Application of convex optimization algorithm in trajectory planning of manned/unmanned cooperative system［J］. Journal of Astronautics， 2020， 41（3）： 276-286 （in Chinese）.
[15]	HUO M Z， DUAN H B. Three-dimension cluster space formation control of manned/unmanned aerial team subject to input constraint［J］. IEEE Transactions on Industrial Informatics， 2024， 20（6）： 8596-8604.
[16]	HE H X， DUAN H B， YUAN W M， et al. A potential game approach to target assignment in heterogeneous manned/unmanned aerial team with incomplete information［J］. IEEE Transactions on Circuits and Systems Ⅱ： Express Briefs， 2024， 71（12）： 4894-4898.
[17]	熊威，张栋，任智，等. 面向有人/无人机协同打击的智能决策方法研究［J］. 系统工程与电子技术， 2025， 47（4）： 1285-1299.
	XIONG W， ZHANG D， REN Z， et al. Research on intelligent decision-making methods for coordinated attack by manned aerial vehicles and unmanned aerial vehicles［J］. Systems Engineering and Electronics， 2025， 47（4）： 1285-1299 （in Chinese）.
[18]	VOUROS G A. Explainable deep reinforcement learning： state of the art and challenges［J］. ACM Computing Surveys， 2023， 55（5）： 1-39.
[19]	杨书恒，张栋，熊威，等. 基于可解释性强化学习的空战机动决策方法［J］. 航空学报， 2024， 45（18）： 329922.
	YANG S H， ZHANG D， XIONG W， et al. Decision-making method for air combat maneuver based on explainable reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（18）： 329922 （in Chinese）.
[20]	AKROUR R， TATEO D， PETERS J. Continuous action reinforcement learning from a mixture of interpretable experts［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（10）： 6795-6806.
[21]	WANG C， WU L Z， YAN C， et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork［J］. Chinese Journal of Aeronautics， 2020， 33（11）： 2930-2945.
[22]	SOARES E， ANGELOV P P， COSTA B， et al. Explaining deep learning models through rule-based approximation and visualization［J］. IEEE Transactions on Fuzzy Systems， 2021， 29（8）： 2399-2407.
[23]	ÇETIN E， BARRADO C， SALAMÍ E， et al. Analyzing deep reinforcement learning model decisions with Shapley additive explanations for counter drone operations［J］. Applied Intelligence， 2024， 54（23）： 12095-12111.
[24]	HE L， AOUF N， SONG B F. Explainable Deep Reinforcement Learning for UAV autonomous path planning［J］. Aerospace Science and Technology， 2021， 118： 107052.
[25]	HICKLING T， ZENATI A， AOUF N， et al. Explainability in deep reinforcement learning： A review into current methods and applications［J］. ACM Computing Surveys， 2023， 56（5）： 1-35.
[26]	ZHOU Y T， KONG X R， LIN K P， et al. Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning［J］. Knowledge-Based Systems， 2024， 287： 111462.
[27]	AUSTIN F， CARBONE G， FALCO M， et al. Automated maneuvering decisions for air-to-air combat：AIAA-1987-2393［R］. Reston： AIAA， 1987.
[28]	ŠTRUMBELJ E， KONONENKO I. Explaining prediction models and individual predictions with feature contributions［J］. Knowledge and Information Systems， 2014， 41（3）： 647-665.
[29]	HEUILLET A， COUTHOUIS F， DÍAZ-RODRÍGUEZ N. Collective Explainable AI： Explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values［J］. IEEE Computational Intelligence Magazine， 2022， 17（1）： 59-71.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

机动类型	编号	机动状态	速度	速度偏角	速度偏角变化率	高度	高度变化率
直线类	1	平飞	保持	保持	保持	保持	保持
	2	爬升	减小	保持	保持	增大	增大-减小
	3	俯冲	增大	保持	保持	减小	减小-增大
盘旋类	4	左盘旋	保持	减小	保持	保持	保持
	5	右盘旋	保持	增大	保持	保持	保持
	6	半滚倒转	增大	突变	突变	减小	增大-减小
翻滚类	7	桶滚	减小	突变	突变	增大-减小	增大-减小
	8	筋斗	增大	突变	突变	增大-减小	增大-减小
	9	半筋斗	减小	突变	突变	增大	增大-减小
战斗转弯类	10	战斗转弯	减小-增大	突变	突变	增大-减小	增大-减小

面向智能空战有人/无人机协同可解释方法

Manned/unmanned aerial vehicle collaborative interpretable method for intelligent air combat

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 29

相关文章 15

编辑推荐

Metrics

本文评价

[1]	高思华, 赵炳阳, 李建伏. 基于时间窗约束的无人机完整性数据采集路径规划算法[J]. 航空学报, 2026, 47(6): 332451-332451.
[2]	廉云霄, 李霓, 谢锋, 周攀, 董长印. 基于时空信息融合的多机协同空战决策方法[J]. 航空学报, 2026, 47(6): 332633-332633.
[3]	张磊, 田灿, 文方青, 张清河, 刘含. 面向移动边缘网络的多目标进化深度确定性策略梯度算法[J]. 航空学报, 2026, 47(3): 631880-631880.
[4]	刘祥雨, 王刚, 王思远, 陈卓文. 数据-知识双驱动的编队目标意图识别方法[J]. 航空学报, 2026, 47(2): 332170-332170.
[5]	马赞, 白杰, 闫励勤, 陈勇, 孙淑光. 基于贝叶斯优化的机载智能避让系统安全性评估[J]. 航空学报, 2026, 47(1): 331973-331973.
[6]	章涛, 李攀, 王梓旭, 朱振华. 面向直升机姿态控制的强化学习奖励函数设计[J]. 航空学报, 2025, 46(S1): 732184-732184.
[7]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[8]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[9]	李恒晖, 林前辉, 韩涛锋, 何阳. 基于能量机动的近距空战模型及应用[J]. 航空学报, 2025, 46(7): 330863-330863.
[10]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[11]	谢启超, 曹承钰, 赵逸云, 李繁飙. 基于深度强化学习调参的制导控制一体化方法[J]. 航空学报, 2025, 46(24): 632345-632345.
[12]	刘嘉琛, 董磊, 孙紫荆, 倪晔, 陈曦, 王鹏. 基于PPO-SHAP的IMA系统资源分配可解释决策方法[J]. 航空学报, 2025, 46(24): 331872-331872.
[13]	范天麒, 邹征夏, 史振威. 基于强化学习数据合成的典型遥感目标检测[J]. 航空学报, 2025, 46(23): 631955-631955.
[14]	赵良瑾, 仝昊楠, 苑子杨, 李昀镀, 张晓典, 成培瑞. 无人机集群的干扰管理：机理、技术与挑战[J]. 航空学报, 2025, 46(23): 632022-632022.
[15]	王辰, 魏才盛, 殷泽阳, 靳锴, 李星辰. 考虑信道资源约束的多无人机航迹与通信策略协同规划[J]. 航空学报, 2025, 46(18): 331837-331837.