防空反导系统是构成国家空天安全屏障的核心要素,其目标拦截能力是决定作战效能的关键。防空反导目标拦截问题随作战发展逐步呈现目标规模大、价值差异性大、实时性要求高等特点,现有技术方法通常面临拦截策略空间随目标规模指数级增长、延迟奖励导致样本利用率低且决策过程不可解释挑战,难以满足作战需求。为此,本文提出一种基于可解释分层Dueling DQN(Explainable Hierarchical Dueling DQN, EHD-DQN)的拦截策略框架。该框架采用分层网络架构,通过“上层排序—下层拦截”的分层解耦抑制策略空间指数级爆炸并压缩决策链路;通过时间衰减多经验池提升延迟奖励下的样本利用率与收敛稳定性;引入Grad-CAM与LIME组成的可解释模块,将解释信号嵌入训练闭环,提供可解释依据。实验表明,相较 DQN、DDPG、PPO 及三类传统优化算法(滑动窗口混合整数规划RH-MILP、非支配排序遗传算法NSGA-II、自适应大邻域搜索算法ALNS),EHD-DQN 在拦截数量、弹药利用与高价值目标的拦截时机等指标上取得更优表现,并能提供面向指挥参谋的透明决策依据。研究成果可为防空反导指挥控制系统提供兼具高效性和可解释性的智能决策新范式。
Air and missile defense (AMD) systems are a core pillar of a nation’s aerospace security shield, and target-interception capability is decisive for overall combat effectiveness. As operations evolve, the AMD interception problem is increasing-ly characterized by large target scales, pronounced value heterogeneity, and stringent real-time requirements; existing techniques typically face an interception policy space that grows exponentially with target count, poor sample efficiency under delayed rewards, and opaque decision processes—limitations that fall short of operational needs. To address these challenges, this paper proposes an interception strategy framework based on Explainable Hierarchical Duel-ing DQN (EHD-DQN). It suppresses exponential policy-space growth and shortens the decision chain through a hierar-chical decoupling of “upper-level ranking → lower-level interception”; improves sample efficiency and convergence stabil-ity under delayed rewards via temporally decayed multi-experience buffers; and embeds an explainability module that combines Gradient-weighted Class Activation Mapping (Grad-CAM) and Local Interpretable Model-agnostic Explanations (LIME) to inject explanation signals into the training loop and provide traceable decision rationales. Compared with DQN, DDPG, PPO, and three traditional optimization algorithms—rolling-horizon mixed-integer linear programming (RH-MILP), non-dominated sorting genetic algorithm II (NSGA-II), and adaptive large neighborhood search (ALNS)—EHD-DQN achieves superior performance in interception count, ammunition utilization, and engagement timing for high-value targets, while furnishing transparent, staff-oriented justifications for command decision-making. The results indicate that EHD-DQN offers an efficient and explainable decision-making paradigm for AMD command-and-control systems.