基于可解释分层强化学习的防空反导策略优化

doi:10.7527/S1000-6893.2025.32786

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 |

基于可解释分层强化学习的防空反导策略优化

刘宇衡, 杨力, 黄琦龙()

南京理工大学自动化学院，南京 210094

收稿日期:2025-09-15 修回日期:2025-09-27 接受日期:2025-10-21 出版日期:2025-10-31 发布日期:2025-10-30
通讯作者: 黄琦龙 E-mail:huangql@njust.edu.cn
基金资助:
国家自然科学基金(U21B2003)

Optimizing air and missile defense strategies with explainable hierarchical reinforcement learning

Yuheng LIU, Li YANG, Qilong HUANG()

School of Mechanical Engineering，Nanjing University of Science and Technology，Nanjing 210094，China

Received:2025-09-15 Revised:2025-09-27 Accepted:2025-10-21 Online:2025-10-31 Published:2025-10-30
Contact: Qilong HUANG E-mail:huangql@njust.edu.cn
Supported by:
National Natural Science Foundation of China(U21B2003)

摘要/Abstract

摘要：

防空反导系统（AMD）是构成国家空天安全屏障的核心要素，其目标拦截能力是决定作战效能的关键。防空反导目标拦截问题随作战发展逐步呈现目标规模大和价值差异性大和实时性要求高等特点，现有技术方法通常面临拦截策略空间随目标规模指数级增长、延迟奖励导致样本利用率低且决策过程不可解释的挑战，难以满足作战需求。为此，提出一种基于可解释分层对决式深度Q网络（EHD-DQN）的拦截策略框架。该框架采用分层网络架构，通过“上层排序—下层拦截”的分层解耦，抑制策略空间指数级爆炸并压缩决策链路；通过时间衰减多经验池，提升延迟奖励下的样本利用率与收敛稳定性；引入Grad-CAM与LIME组成的可解释模块，将解释信号嵌入训练闭环，提供可解释依据。试验表明，相较深度Q网络（DQN）、深度确定性策略梯度（DDPG）、近端策略优化（PPO）及3类传统优化算法（滑动窗口混合整数规划（RH-MILP）、非支配排序遗传算法（NSGA-Ⅱ）、自适应大邻域搜索算法（ALNS）），EHD-DQN 在拦截数量、弹药利用与高价值目标的拦截时机等指标上取得更优表现，并能提供面向指挥参谋的透明决策依据。结果表明EHD-DQN可为防空反导指挥控制系统提供兼具高效性和可解释性的智能决策新范式。

关键词: 分层强化学习, 可解释性人工智能, 防空反导决策, dueling DQN, 协同优化

Abstract:

Air and Missile Defense （AMD） systems are core elements of a nation’s aerospace security shield， and their target-interception capability is key to determining combat effectiveness. With the evolution of warfare， the AMD interception problem is increasingly characterized by large target scales， pronounced value heterogeneity， and stringent real-time requirements. Existing techniques typically face an interception policy space that grows exponentially with target count， poor sample efficiency under delayed rewards， and opaque decision processes， making them insufficient for operational needs. To address these challenges， this paper proposes an interception strategy framework based on Explainable Hierarchical Dueling DQN （EHD-DQN）. This framework suppresses exponential policy-space growth and shortens the decision chain through a hierarchical decoupling of “upper-level ranking → lower-level interception”. A temporally decayed multi-experience buffers is introduced to improve sample efficiency and convergence stability under delayed rewards. Moreover， an explainability module that combines Gradient-weighted Class Activation Mapping （Grad-CAM） and Local Interpretable Model-agnostic Explanations （LIME） is embedded to inject explanation signals into the training loop and provide traceable decision rationales. Compared with Deep Q-Network （DQN）， Deep Deterministic Policy Gradient （DDPG）， Proximal Policy Optimization （PPO）， and three traditional optimization algorithms—Rolling-Horizon Mixed-Integer Linear Programming （RH-MILP）， Non-dominated Sorting Genetic Algorithm Ⅱ （NSGA-Ⅱ）， and Adaptive Large Neighborhood Search （ALNS）， EHD-DQN achieves superior performance in interception count， ammunition utilization， and engagement timing for high-value targets， while furnishing transparent， staff-oriented justifications for command decision-making. The results indicate that EHD-DQN offers an efficient and explainable decision-making paradigm for AMD command-and-control systems.

Key words: hierarchical reinforcement learning, explainable artificial intelligence, air defense and anti-missile decision-making, dueling DQN, collaborative optimization

中图分类号:

刘宇衡, 杨力, 黄琦龙. 基于可解释分层强化学习的防空反导策略优化[J]. 航空学报, 2026, 47(8): 332786.

Yuheng LIU, Li YANG, Qilong HUANG. Optimizing air and missile defense strategies with explainable hierarchical reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(8): 332786.

图/表 20

图 1

表1

表2

表3

图 2

图 3

图 4

表4

表5

图 5

图 6

图 7

图 8

图 9

图 10

图 11

图 12

图 13

图 14

图 15

参考文献 34

[1]	刘伟，张琳，王代强，等. 激光武器反无人机集群作战运用及关键技术［J］. 航空学报， 2024， 45（12）： 329457.
	LIU W， ZHANG L， WANG D Q， et al. Application and key technologies of laser weapons in anti-UAV swarm operations［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（12）： 329457 （in Chinese）.
[2]	SUN Z Y， YANG J Y. Multi-missile interception for multi-targets： Dynamic situation assessment， target allocation and cooperative interception in groups［J］. Journal of the Franklin Institute， 2022， 359（12）： 5991-6022.
[3]	LI J R， WU G H， WANG L. A comprehensive survey of weapon target assignment problem： Model， algorithm， and application［J］. Engineering Applications of Artificial Intelligence， 2024， 137： 109212.
[4]	OH S H， BYUEON G W， CHO Y I， et al. Artificial intelligence in combat decision-making： Weapon target assignment via reinforcement learning and graph neural networks［J］. IEEE Transactions on Cybernetics， 2025， pp（99）：1-13.
[5]	TUNCER O， CIRPAN H A. Adaptive fuzzy based threat evaluation method for air and missile defense systems［J］. Information Sciences， 2023， 643： 119191.
[6]	CHEN L， YANG J， ZHOU Y Z， et al. A rule-based agent for unmanned systems with TDGG and VGD for online air target intention recognition［J］. Drones， 2024， 8（12）： 765.
[7]	COSKUN M， TASDEMIR S. Fuzzy logic-based threat assessment application in air defense systems［J］. IEEE Transactions on Aerospace and Electronic Systems， 2023， 59（3）： 2245-2251.
[8]	PIRES H B， GUIMARÃES L N F. Dynamic multi-target three-way threat assessment in the context of air defense［J］. IEEE Access， 2024， 12： 141397-141413.
[9]	PIRES H B， GUIMARÃES L N F， REBOUÇAS S. A multi-target threat assessment method based on objective three-way decision［J］. IEEE Access， 2025， 13： 681-694.
[10]	刘富樯，周伦，刘中阳，等. 基于三支决策和遗传算法的动态武器目标分配［J］. 兵工学报， 2025， 46（3）： 240281.
	LIU F Q， ZHOU L， LIU Z Y， et al. Dynamic weapon-target assignment based on three-way decision and genetic algorithm［J］. Acta Armamentarii， 2025， 46（3）： 240281 （in Chinese）.
[11]	唐明南，张承龙，赵强，等. 任务场景驱动的防空资源部署方案智能生成与优化方法［J］. 现代防御技术， 2023， 51（3）： 1-9.
	TANG M N， ZHANG C L， ZHAO Q， et al. Scenario-driven and intelligent optimization of disposition scheme for air defense［J］. Modern Defense Technology， 2023， 51（3）： 1-9 （in Chinese）.
[12]	毕文豪，周久力，段晓波，等. 基于多要素改进NSGA-Ⅱ的小直径制导炸弹空面打击最优火力分配方法［J］. 航空学报， 2023， 44（17）： 328116.
	BI W H， ZHOU J L， DUAN X B， et al. Optimal fire distribution method of small diameter guided bomb in air-to-surface strike based on multi-factor modified NSGA-Ⅱ［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（17）： 328116 （in Chinese）.
[13]	SONG J M， CHENG T， WANG Y M， et al. LPI-based resource allocation strategy for multiple targets tracking in CMIMO radar system with array division［J］. Signal Processing， 2024， 225： 109625.
[14]	BERTSIMAS D， PASKOV A. Solving large-scale weapon target assignment problems in seconds using branch-price-and-cut［J］. Naval Research Logistics （NRL）， 2025， 72（5）： 735-749.
[15]	隆雨佟，陈爱国，史红权，等. 基于改进差分进化算法的跨平台武器目标分配方法［J］. 系统工程与电子技术， 2024， 46（3）： 953-962.
	LONG Y T， CHEN A G， SHI H Q， et al. Cross-platform weapon target allocation method based on improved differential evolution algorithm［J］. Systems Engineering and Electronics， 2024， 46（3）： 953-962 （in Chinese）.
[16]	YI X J， YU H Y， XU T. Solving multi-objective weapon-target assignment considering reliability by improved MOEA/D-AM2M［J］. Neurocomputing， 2024， 563： 126906.
[17]	Lu Y， Chen D Z， Gao T. An exact algorithm for the dynamic two-stage weapon-target assignment problem： abstract=4485993［R］. SSRN， 2023.
[18]	孙昕，邢立宁，王锐，等. 基于多目标进化算法的防空导弹武器目标分配［J］. 系统仿真学报， 2024， 36（6）： 1298-1308.
	SUN X， XING L N， WANG R， et al. Air defense missile weapon target assignment based on multi-objective evolutionary algorithm［J］. Journal of System Simulation， 2024， 36（6）： 1298-1308 （in Chinese）.
[19]	ZHAO J， LV Y F. Output-feedback robust control of systems with uncertain dynamics via data-driven policy learning［J］. International Journal of Robust and Nonlinear Control， 2022， 32（18）： 9791-9807.
[20]	高树一，林德福，郑多，等. 考虑拦截器探测能力限制的飞行器智能机动突防制导策略［J］. 航空学报， 2025， 46（10）： 331304.
	GAO S Y， LIN D F， ZHENG D， et al. Intelligent maneuvering penetration guidance strategies for aerial vehicles considering interceptor detection capability limitations［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（10）： 331304 （in Chinese）.
[21]	ZHAO M R， WANG G， FU Q， et al. Intelligent decision‐making system of air defense resource allocation via hierarchical reinforcement learning［J］. International Journal of Intelligent Systems， 2024， 2024（1）： 7777050.
[22]	LI T， WANG G， FU Q， et al. An intelligent algorithm for solving weapon-target assignment problem： DDPG-DNPE algorithm［J］. Computers， Materials & Continua， 2023， 76（3）： 3499-3522.
[23]	NA H， AHN J， MOON I C. Weapon-target assignment by reinforcement learning with pointer network［J］. Journal of Aerospace Information Systems， 2023， 20（1）： 53-59.
[24]	闫世祥，刘海军. 基于深度强化学习的传感器-武器-目标分配方法［J］. 现代防御技术， 2025， 53（4）： 10-17.
	YAN S X， LIU H J. Sensor-weapon-target assignment method based on deep reinforcement learning［J］. Modern Defense Technology， 2025， 53（4）： 10-17 （in Chinese）.
[25]	QIN P， ZHAO T. Knowledge guided fuzzy deep reinforcement learning［J］. Expert Systems with Applications， 2025， 264： 125823.
[26]	VOUROS G A. Explainable deep reinforcement learning： state of the art and challenges［J］. ACM Computing Surveys， 2022， 55（5）： 1-39.
[27]	张晨浩，周焰，蔡益朝，等. 空中目标作战意图识别研究综述［J］. 现代防御技术， 2024， 52（4）： 1-15.
	ZHANG C H， ZHOU Y， CAI Y C， et al. A review of air target operational intention recognition research［J］. Modern Defense Technology， 2024， 52（4）： 1-15 （in Chinese）.
[28]	KIM J E， LEE C H， YI M Y. A study on the weapon–target assignment problem considering heading error［J］. International Journal of Aeronautical and Space Sciences， 2024， 25（3）： 1105-1120.
[29]	ZHAO K， SONG J， YU J W， et al. Integrated assignment and guidance with multi-objective function in a three-dimensional scenario［J］. Engineering Optimization， 2025： 1-16.
[30]	WONG A， BÄCK T， KONONOVA A V， et al. Deep multiagent reinforcement learning： Challenges and directions［J］. Artificial Intelligence Review， 2023， 56（6）： 5023-5056.
[31]	MINH D， WANG H X， LI Y F， et al. Explainable artificial intelligence： A comprehensive review［J］. Artificial Intelligence Review， 2022， 55： 3503-3568.
[32]	GAJCIN J， DUSPARIC I. Redefining counterfactual explanations for reinforcement learning： Overview， challenges and opportunities［J］. ACM Computing Surveys， 2024， 56（9）： 1-33.
[33]	RIBEIRO M， SINGH S， GUESTRIN C. “Why should I trust you？” Explaining the predictions of any classifier［C］∥2016 Conference of the north American chapter of the association for computational linguistics： Demonstrations. San Diego： NAACL， 2016： 97-101.
[34]	SELVARAJU R R， COGSWELL M， DAS A， et al. Grad-CAM： Visual explanations from deep networks via gradient-based localization［J］. International Journal of Computer Vision， 2020， 128： 336-359.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

类别	弹药数量	火力单元数量	弹药速度
远程火力单元	16	1	高
中程火力单元	20	2	中
近程火力单元	30	2	低

类别	速度/（km·s^-1）	最大转率/（rad·s^-1）
战斗机	0.25	0.10
巡航导弹	0.24	0.21
无人机	0.06	0.05
远/中/近程拦截弹	0.90/0.65/0.45	0.35/0.28/0.22

阶段	时刻	上层高价值目标（ID/类型/距目标）	下层分配
前期	2	T07/战斗机/120，T03/战斗机/128，T05/战斗机/132，T12/巡航弹/160，T21/巡航弹/168	LR1→T07，MR1→T12，MR2→T21，SR1→T31，SR2→T33
前期	4	T07/战斗机/120，T03/战斗机/128，T05/战斗机/132，T12/巡航弹/160，T21/巡航弹/168	LR1→T03，MR1→T09，MR2→T27，SR1→T36，SR2→T37
中期	12	T27/巡航弹/68，T18/巡航弹/72，T24/巡航弹/80，T32/巡航弹/83，T05/战斗机/76	LR1→T05，MR1→T18，MR2→T27，SR1→T35，SR2→T40
中期	14	T18/巡航弹/66，T24/巡航弹/74，T38/巡航弹/82，T34/巡航弹/84，T41/无人机/96	LR1→T11，MR1→T18，MR2→T24，SR1→T41，SR2→T38
后期	22	T33/无人机/70，T40/无人机/74，T41/无人机/78，T24/巡航弹/63，T32/巡航弹/72	LR1→T24，MR1→T32，MR2→T38，SR1→T33，SR2→T40
后期	24	T40/无人机/68，T41/无人机/70，T42/无人机/72，T34/巡航弹/71，T29/巡航弹/73	LR1→T34，MR1→T29，MR2→T32，SR1→T40，SR2→T41

基于可解释分层强化学习的防空反导策略优化

Optimizing air and missile defense strategies with explainable hierarchical reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 20

参考文献 34

相关文章 14

编辑推荐

Metrics

本文评价

[1]	王沛曌, 何明, 陈海华, 王鸿鹏. 考虑通信拓扑控制的FANET实时任务调度算法[J]. 航空学报, 2026, 47(6): 332636-332636.
[2]	罗祎喆, 张辉, 余新得, 金钊, 冯朔, 石育澄, 徐明亮. 面向舰载机多波次弹药保障任务的分层动态调度[J]. 航空学报, 2025, 46(18): 331945-331945.
[3]	王辰, 魏才盛, 殷泽阳, 靳锴, 李星辰. 考虑信道资源约束的多无人机航迹与通信策略协同规划[J]. 航空学报, 2025, 46(18): 331837-331837.
[4]	高天贺, 田阔, 黄蕾, 张澍, 李增聪. 数据驱动的曲面构件形状⁃拓扑协同优化方法[J]. 航空学报, 2024, 45(2): 428806-428806.
[5]	张永杰, 周静飘, 石磊, 李栋, 张彬乾. 基于PRSEUS结构的翼身融合民机中央机体球亏面框优化设计方法[J]. 航空学报, 2024, 45(12): 229331-229331.
[6]	许敉, 毛泽钡, 王博, 李桐. 快速优化薄板中各向异性材料分布的等效变形模量算法[J]. 航空学报, 2024, 45(10): 229273-229273.
[7]	甘文彪, 周洲, 许晓平. 仿生全翼式太阳能无人机分层协同设计及分析[J]. 航空学报, 2016, 37(1): 163-178.
[8]	贾光辉, 段枭. 蜂窝夹层板BLE的一种增强型协同优化建模方法[J]. 航空学报, 2015, 36(7): 2260-2268.
[9]	刘成武, 靳晓雄, 刘云平, 刘继红. 集成BLISCO和iPMA的多学科可靠性设计优化[J]. 航空学报, 2014, 35(11): 3054-3063.
[10]	贾志刚, 王荣桥, 胡殿印. 流固耦合在涡轮多学科优化设计中的应用[J]. 航空学报, 2013, 34(12): 2777-2784.
[11]	李焦赞, 高正红. 多变量气动设计问题分层协同优化[J]. 航空学报, 2013, 34(1): 58-65.
[12]	吴蓓蓓, 黄海, 吴文瑞. 带子星航天器总体参数多学科设计优化[J]. 航空学报, 2011, 32(4): 628-635.
[13]	刘克龙;姚卫星;余雄庆. 运用低自由度协同优化的机翼结构气动多学科设计优化[J]. 航空学报, 2007, 28(5): 1025-1032.
[14]	白小涛;李为吉. 基于近似技术的协同优化方法在机翼设计优化中的应用[J]. 航空学报, 2006, 27(5): 847-850.