航空学报 > 2026, Vol. 47 Issue (8): 332786-332786   doi: 10.7527/S1000-6893.2025.32786

基于可解释分层强化学习的防空反导策略优化

刘宇衡, 杨力, 黄琦龙()   

  1. 南京理工大学 自动化学院,南京 210094
  • 收稿日期:2025-09-15 修回日期:2025-09-27 接受日期:2025-10-21 出版日期:2025-10-31 发布日期:2025-10-30
  • 通讯作者: 黄琦龙 E-mail:huangql@njust.edu.cn
  • 基金资助:
    国家自然科学基金(U21B2003)

Optimizing air and missile defense strategies with explainable hierarchical reinforcement learning

Yuheng LIU, Li YANG, Qilong HUANG()   

  1. School of Mechanical Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2025-09-15 Revised:2025-09-27 Accepted:2025-10-21 Online:2025-10-31 Published:2025-10-30
  • Contact: Qilong HUANG E-mail:huangql@njust.edu.cn
  • Supported by:
    National Natural Science Foundation of China(U21B2003)

摘要:

防空反导系统(AMD)是构成国家空天安全屏障的核心要素,其目标拦截能力是决定作战效能的关键。防空反导目标拦截问题随作战发展逐步呈现目标规模大和价值差异性大和实时性要求高等特点,现有技术方法通常面临拦截策略空间随目标规模指数级增长、延迟奖励导致样本利用率低且决策过程不可解释的挑战,难以满足作战需求。为此,提出一种基于可解释分层对决式深度Q网络(EHD-DQN)的拦截策略框架。该框架采用分层网络架构,通过“上层排序—下层拦截”的分层解耦,抑制策略空间指数级爆炸并压缩决策链路;通过时间衰减多经验池,提升延迟奖励下的样本利用率与收敛稳定性;引入Grad-CAM与LIME组成的可解释模块,将解释信号嵌入训练闭环,提供可解释依据。试验表明,相较深度Q网络(DQN)、深度确定性策略梯度(DDPG)、近端策略优化(PPO) 及3类传统优化算法(滑动窗口混合整数规划(RH-MILP)、非支配排序遗传算法(NSGA-Ⅱ)、自适应大邻域搜索算法(ALNS)),EHD-DQN 在拦截数量、弹药利用与高价值目标的拦截时机等指标上取得更优表现,并能提供面向指挥参谋的透明决策依据。结果表明EHD-DQN可为防空反导指挥控制系统提供兼具高效性和可解释性的智能决策新范式。

关键词: 分层强化学习, 可解释性人工智能, 防空反导决策, dueling DQN, 协同优化

Abstract:

Air and Missile Defense (AMD) systems are core elements of a nation’s aerospace security shield, and their target-interception capability is key to determining combat effectiveness. With the evolution of warfare, the AMD interception problem is increasingly characterized by large target scales, pronounced value heterogeneity, and stringent real-time requirements. Existing techniques typically face an interception policy space that grows exponentially with target count, poor sample efficiency under delayed rewards, and opaque decision processes, making them insufficient for operational needs. To address these challenges, this paper proposes an interception strategy framework based on Explainable Hierarchical Dueling DQN (EHD-DQN). This framework suppresses exponential policy-space growth and shortens the decision chain through a hierarchical decoupling of “upper-level ranking → lower-level interception”. A temporally decayed multi-experience buffers is introduced to improve sample efficiency and convergence stability under delayed rewards. Moreover, an explainability module that combines Gradient-weighted Class Activation Mapping (Grad-CAM) and Local Interpretable Model-agnostic Explanations (LIME) is embedded to inject explanation signals into the training loop and provide traceable decision rationales. Compared with Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and three traditional optimization algorithms—Rolling-Horizon Mixed-Integer Linear Programming (RH-MILP), Non-dominated Sorting Genetic Algorithm Ⅱ (NSGA-Ⅱ), and Adaptive Large Neighborhood Search (ALNS), EHD-DQN achieves superior performance in interception count, ammunition utilization, and engagement timing for high-value targets, while furnishing transparent, staff-oriented justifications for command decision-making. The results indicate that EHD-DQN offers an efficient and explainable decision-making paradigm for AMD command-and-control systems.

Key words: hierarchical reinforcement learning, explainable artificial intelligence, air defense and anti-missile decision-making, dueling DQN, collaborative optimization

中图分类号: