面向智能空战有人/无人机协同可解释方法

doi:10.7527/S1000-6893.2025.32547

Abstract

Abstract:

Manned/Unmanned Aerial Vehicle （M/UAV） teaming represents a critical operational paradigm for future air combat， where deep reinforcement learning serves as a key enabling technology. However， the “black-box nature” of deep reinforcement learning renders the learned strategies difficult to interpret and trust， making interpretable deep reinforcement learning essential for achieving intelligent air combat collaboration. This paper proposes a deep reinforcement learning interpretation method based on the Bayesian Shapley framework， realizes the interpretability modeling and verification analysis of the decision-making process， and achieves the goal of explaining the decision-making basis of UAV. The proposed approach first constructs a decision intent analysis framework for cooperative missions using dynamic Bayesian networks， capable of identifying critical decision nodes in trajectory segments. Subsequently， the Shapley value-based contribution assessment algorithm is employed to achieve state-level quantitative analysis of decision rationale at key nodes. Finally， by reconstructing the state input space of the deep reinforcement learning model， the method significantly enhances model interpretability and trustworthiness while maintaining original policy performance， with the effectiveness of the explanatory results validated through state space ablation simulations.

Key words: human machine collaboration, deep reinforcement learning, interpretability, intelligent air combat, intention identification

CLC Number:

V271.4
TP181

Wei XIONG, Dong ZHANG, Shuheng YANG, Zhi REN, Wenyi LIU. Manned/unmanned aerial vehicle collaborative interpretable method for intelligent air combat[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(7): 332547.

Figures/Tables 17

Fig.1

Fig.2

Fig.3

Fig.4

Table 1

Table 2

Manned aerial vehicle parameters of red side

参数	数值	参数	数值
$n x$	$- 1,5$	$n y$	$- 1.5,2$
$n z$	$- 3,3$	$v / (m ⋅ s - 1)$	$250,400$
$h / k m$	$[1,20]$	$φ R / r a d$	$π / 6$
$d M / k m$	$1,100$	$d R / k m$	$0,50$

Table 2

Table 3

Unmanned aerial vehicle parameters of red side

参数	数值	参数	数值
$n x$	$- 1,5$	$n y$	$- 1.5,1.5$
$n z$	$- 2,2$	$v / (m ⋅ s - 1)$	$150,300$
$h / k m$	$[1,20]$	$φ R / r a d$	$π / 6$
$d M / k m$	$1,10$	$d R / k m$	$0,20$

Table 3

Table 4

Unmanned aerial vehicle parameters of blue side

参数	数值	参数	数值
$n x$	$- 1,5$	$n y$	$- 1.5,3$
$n z$	$- 3,3$	$v / (m ⋅ s - 1)$	$250,400$
$h / k m$	$[1,20]$	$φ R / r a d$	$π / 6$
$d M / k m$	$1,100$	$d R / k m$	$0,55$

Table 4

Table 5

Hyperparameter setting

参数	数值
Actor学习率	$1 × 10 - 5$
Critic学习率	$1 × 10 - 5$
软更新因子	$0.01$
学习衰减率	$0.98$
经验回放池容量	$10 - 6$
抽取样本数	$256$
学习间隔步长	$10$
训练最大局数	$2 × 104$

Table 5

Table 6

Environmental parameter setting

参数	数值
$x / k m$	$[- 100,100]$
$y / k m$	$[- 100,100]$
无人机数量	2
$d s a f e / m$	100
蓝方初始位置 $/ k m$	（0，0，5）
蓝方初始航向/（°）	$[- 30,30]$
有人机初始位置 $/ k m$	（100，0，5）
有人机初始航向/（°）	［150，210］
无人机1初始位置 $/ k m$	（60，30，5）
无人机1初始航向/（°）	［150，210］
无人机2初始位置 $/ k m$	$(60, - 30,5)$
无人机2初始航向/（°）	［150，210］

Table 6

Fig.5

Fig.6

Fig.7

Fig.8

Fig.9

Fig.10

Fig.11

References 29

[1]	王童豪，彭星光，胡浩，等. 海上有人/无人协同系统及其关键技术综述［J］. 兵工学报， 2024， 45（10）： 3317-3340.
	WANG T H， PENG X G， HU H， et al. Maritime manned/unmanned collaborative systems and key technologies： A survey［J］. Acta Armamentarii， 2024， 45（10）： 3317-3340 （in Chinese）.
[2]	UNITED STATES DEPARTMENT OF DEFENSE. Unmanned systems integrated roadmap： FY2013-2038［R］. Washington， D.C.： United States Department of Defense， 2013.
[3]	LI S Y， CHEN M， WANG Y H， et al. A fast algorithm to solve large-scale matrix games based on dimensionality reduction and its application in multiple unmanned combat air vehicles attack-defense decision-making［J］. Information Sciences， 2022， 594： 305-321.
[4]	RUAN W Y， DUAN H B， DENG Y M. Autonomous maneuver decisions via transfer learning pigeon-inspired optimization for UCAVs in dogfight engagements［J］. IEEE/CAA Journal of Automatica Sinica， 2022， 9（9）： 1639-1657.
[5]	ASLAN S， ERKIN T. A multi-population immune plasma algorithm for path planning of unmanned combat aerial vehicle［J］. Advanced Engineering Informatics， 2023， 55： 101829.
[6]	FU Y F， LIU D， CHEN J D， et al. Secretary bird optimization algorithm： A new metaheuristic for solving global optimization problems［J］. Artificial Intelligence Review， 2024， 57（5）： 123.
[7]	PIAO H Y， HAN Y， CHEN H C， et al. Complex relationship graph abstraction for autonomous air combat collaboration： A learning and expert knowledge hybrid approach［J］. Expert Systems with Applications， 2023， 215： 119285.
[8]	LI B， HUANG J Y， BAI S X， et al. Autonomous air combat decision-making of UAV based on parallel self-play reinforcement learning［J］. CAAI Transactions on Intelligence Technology， 2023， 8（1）： 64-81.
[9]	KAUFMANN E， BAUERSFELD L， LOQUERCIO A， et al. Champion-level drone racing using deep reinforcement learning［J］. Nature， 2023， 620（7976）： 982-987.
[10]	CHAI J J， CHEN W Z， ZHU Y H， et al. A hierarchical deep reinforcement learning framework for 6-DOF UCAV air-to-air combat［J］. IEEE Transactions on Systems， Man， and Cybernetics： Systems， 2023， 53（9）： 5417-5429.
[11]	LI B， BAI S X， LIANG S Y， et al. Manoeuvre decision-making of unmanned aerial vehicles in air combat based on an expert actor-based soft actor critic algorithm［J］. CAAI Transactions on Intelligence Technology， 2023， 8（4）： 1608-1619.
[12]	李佐龙，朱纪洪，匡敏驰，等. 基于混合动作的空战分层强化学习决策算法［J］. 航空学报， 2024， 45（17）： 530053.
	LI Z L， ZHU J H， KUANG M C， et al. Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（17）： 530053 （in Chinese）.
[13]	WANG E S， LIU F， HONG C， et al. MADRL-based UAV swarm non-cooperative game under incomplete information［J］. Chinese Journal of Aeronautics， 2024， 37（6）： 293-306.
[14]	李樾，韩维，陈清阳，等. 凸优化算法在有人/无人机协同系统航迹规划中的应用［J］. 宇航学报， 2020， 41（3）： 276-286.
	LI Y， HAN W， CHEN Q Y， et al. Application of convex optimization algorithm in trajectory planning of manned/unmanned cooperative system［J］. Journal of Astronautics， 2020， 41（3）： 276-286 （in Chinese）.
[15]	HUO M Z， DUAN H B. Three-dimension cluster space formation control of manned/unmanned aerial team subject to input constraint［J］. IEEE Transactions on Industrial Informatics， 2024， 20（6）： 8596-8604.
[16]	HE H X， DUAN H B， YUAN W M， et al. A potential game approach to target assignment in heterogeneous manned/unmanned aerial team with incomplete information［J］. IEEE Transactions on Circuits and Systems Ⅱ： Express Briefs， 2024， 71（12）： 4894-4898.
[17]	熊威，张栋，任智，等. 面向有人/无人机协同打击的智能决策方法研究［J］. 系统工程与电子技术， 2025， 47（4）： 1285-1299.
	XIONG W， ZHANG D， REN Z， et al. Research on intelligent decision-making methods for coordinated attack by manned aerial vehicles and unmanned aerial vehicles［J］. Systems Engineering and Electronics， 2025， 47（4）： 1285-1299 （in Chinese）.
[18]	VOUROS G A. Explainable deep reinforcement learning： state of the art and challenges［J］. ACM Computing Surveys， 2023， 55（5）： 1-39.
[19]	杨书恒，张栋，熊威，等. 基于可解释性强化学习的空战机动决策方法［J］. 航空学报， 2024， 45（18）： 329922.
	YANG S H， ZHANG D， XIONG W， et al. Decision-making method for air combat maneuver based on explainable reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（18）： 329922 （in Chinese）.
[20]	AKROUR R， TATEO D， PETERS J. Continuous action reinforcement learning from a mixture of interpretable experts［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（10）： 6795-6806.
[21]	WANG C， WU L Z， YAN C， et al. Coactive design of explainable agent-based task planning and deep reinforcement learning for human-UAVs teamwork［J］. Chinese Journal of Aeronautics， 2020， 33（11）： 2930-2945.
[22]	SOARES E， ANGELOV P P， COSTA B， et al. Explaining deep learning models through rule-based approximation and visualization［J］. IEEE Transactions on Fuzzy Systems， 2021， 29（8）： 2399-2407.
[23]	ÇETIN E， BARRADO C， SALAMÍ E， et al. Analyzing deep reinforcement learning model decisions with Shapley additive explanations for counter drone operations［J］. Applied Intelligence， 2024， 54（23）： 12095-12111.
[24]	HE L， AOUF N， SONG B F. Explainable Deep Reinforcement Learning for UAV autonomous path planning［J］. Aerospace Science and Technology， 2021， 118： 107052.
[25]	HICKLING T， ZENATI A， AOUF N， et al. Explainability in deep reinforcement learning： A review into current methods and applications［J］. ACM Computing Surveys， 2023， 56（5）： 1-35.
[26]	ZHOU Y T， KONG X R， LIN K P， et al. Novel task decomposed multi-agent twin delayed deep deterministic policy gradient algorithm for multi-UAV autonomous path planning［J］. Knowledge-Based Systems， 2024， 287： 111462.
[27]	AUSTIN F， CARBONE G， FALCO M， et al. Automated maneuvering decisions for air-to-air combat：AIAA-1987-2393［R］. Reston： AIAA， 1987.
[28]	ŠTRUMBELJ E， KONONENKO I. Explaining prediction models and individual predictions with feature contributions［J］. Knowledge and Information Systems， 2014， 41（3）： 647-665.
[29]	HEUILLET A， COUTHOUIS F， DÍAZ-RODRÍGUEZ N. Collective Explainable AI： Explaining cooperative strategies and agent contribution in multiagent reinforcement learning with shapley values［J］. IEEE Computational Intelligence Magazine， 2022， 17（1）： 59-71.

机动类型	编号	机动状态	速度	速度偏角	速度偏角变化率	高度	高度变化率
直线类	1	平飞	保持	保持	保持	保持	保持
	2	爬升	减小	保持	保持	增大	增大-减小
	3	俯冲	增大	保持	保持	减小	减小-增大
盘旋类	4	左盘旋	保持	减小	保持	保持	保持
	5	右盘旋	保持	增大	保持	保持	保持
	6	半滚倒转	增大	突变	突变	减小	增大-减小
翻滚类	7	桶滚	减小	突变	突变	增大-减小	增大-减小
	8	筋斗	增大	突变	突变	增大-减小	增大-减小
	9	半筋斗	减小	突变	突变	增大	增大-减小
战斗转弯类	10	战斗转弯	减小-增大	突变	突变	增大-减小	增大-减小

Manned/unmanned aerial vehicle collaborative interpretable method for intelligent air combat

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 17

References 29

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	Sihua GAO, Bingyang ZHAO, Jianfu LI. UAV complete data collection trajectory planning algorithm based on time window constraints [J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(6): 332451-332451.
[2]	Lei ZHANG, Can TIAN, Fangqing WEN, Qinghe ZHANG, Han LIU. Multi-objective evolution with deep deterministic strategy gradient algorithm for mobile edge networks [J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(3): 631880-631880.
[3]	Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024-331024.
[4]	Lingfeng JIANG, Xinkai LI, Hai ZHANG, Hanwei LI, Hongli ZHANG. Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331035-331035.
[5]	Henghui LI, Qianhui LIN, Taofeng HAN, Yang HE. Close-range air combat model based on energy maneuverability and its applications [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(7): 330863-330863.
[6]	Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553-630553.
[7]	Qichao XIE, Chengyu CAO, Yiyun ZHAO, Fanbiao LI. Integrated guidance and control method based on deep reinforcement learning parameter tuning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(24): 632345-632345.
[8]	Chen WANG, Caisheng WEI, Zeyang YIN, Kai JIN, Xingchen LI. Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331837-331837.
[9]	Yu WANG, Zhipeng XIE, Yongjian TIAN, Guanglei MENG. Distributed UAV formation control with virtual structure guided reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331354-331354.
[10]	Wei CHEN, Lulu LI, Dong CHEN, Shaohui ZHANG, Yafei LI, Ke WANG, Yuanyuan JIN, Mingliang XU. Multi-aircraft cooperative decision-making methods driven by differentiated support demands for carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531274-531274.
[11]	Xudong CHEN, Qiqi CHEN, Yizhe LUO, Jiabao WANG, Mingliang XU. Dynamic parallel scheduling of heterogeneous carrier-based aircraft deck support operations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531329-531329.
[12]	Zheng WANG, Hua WANG, Keke CUI, Chaochao LI, Junnan LIU, Mingliang XU. Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531333-531333.
[13]	Wenhui LING, Chunhui MU, Lingcong NIE, Xian DU, Ximing SUN. Improved DDPG-based multipoint pressure distribution control of variable geometry scramjet combustor at wide range velocities [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 131092-131092.
[14]	Zijie YU, Zheng ZHENG, Qingdong LI, Lin GUO, Suping REN, Jian GUO. Trajectory planning for solar-powered UAVs based on deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 331420-331420.
[15]	Shuyi GAO, Defu LIN, Duo ZHENG, Cheng XU. Intelligent maneuvering penetration guidance strategies for aerial vehicles considering interceptor detection capability limitations [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(10): 331304-331304.