导航

Acta Aeronautica et Astronautica Sinica ›› 2026, Vol. 47 ›› Issue (8): 332786.doi: 10.7527/S1000-6893.2025.32786

• Electronics and Electrical Engineering and Control • Previous Articles    

Optimizing air and missile defense strategies with explainable hierarchical reinforcement learning

Yuheng LIU, Li YANG, Qilong HUANG()   

  1. School of Mechanical Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
  • Received:2025-09-15 Revised:2025-09-27 Accepted:2025-10-21 Online:2025-10-31 Published:2025-10-30
  • Contact: Qilong HUANG E-mail:huangql@njust.edu.cn
  • Supported by:
    National Natural Science Foundation of China(U21B2003)

Abstract:

Air and Missile Defense (AMD) systems are core elements of a nation’s aerospace security shield, and their target-interception capability is key to determining combat effectiveness. With the evolution of warfare, the AMD interception problem is increasingly characterized by large target scales, pronounced value heterogeneity, and stringent real-time requirements. Existing techniques typically face an interception policy space that grows exponentially with target count, poor sample efficiency under delayed rewards, and opaque decision processes, making them insufficient for operational needs. To address these challenges, this paper proposes an interception strategy framework based on Explainable Hierarchical Dueling DQN (EHD-DQN). This framework suppresses exponential policy-space growth and shortens the decision chain through a hierarchical decoupling of “upper-level ranking → lower-level interception”. A temporally decayed multi-experience buffers is introduced to improve sample efficiency and convergence stability under delayed rewards. Moreover, an explainability module that combines Gradient-weighted Class Activation Mapping (Grad-CAM) and Local Interpretable Model-agnostic Explanations (LIME) is embedded to inject explanation signals into the training loop and provide traceable decision rationales. Compared with Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and three traditional optimization algorithms—Rolling-Horizon Mixed-Integer Linear Programming (RH-MILP), Non-dominated Sorting Genetic Algorithm Ⅱ (NSGA-Ⅱ), and Adaptive Large Neighborhood Search (ALNS), EHD-DQN achieves superior performance in interception count, ammunition utilization, and engagement timing for high-value targets, while furnishing transparent, staff-oriented justifications for command decision-making. The results indicate that EHD-DQN offers an efficient and explainable decision-making paradigm for AMD command-and-control systems.

Key words: hierarchical reinforcement learning, explainable artificial intelligence, air defense and anti-missile decision-making, dueling DQN, collaborative optimization

CLC Number: