Electronics and Electrical Engineering and Control

Optimizing air and missile defense strategies with explainable hierarchical reinforcement learning

  • Yuheng LIU ,
  • Li YANG ,
  • Qilong HUANG
Expand
  • School of Mechanical Engineering,Nanjing University of Science and Technology,Nanjing 210094,China

Received date: 2025-09-15

  Revised date: 2025-09-27

  Accepted date: 2025-10-21

  Online published: 2025-10-30

Supported by

National Natural Science Foundation of China(U21B2003)

Abstract

Air and Missile Defense (AMD) systems are core elements of a nation’s aerospace security shield, and their target-interception capability is key to determining combat effectiveness. With the evolution of warfare, the AMD interception problem is increasingly characterized by large target scales, pronounced value heterogeneity, and stringent real-time requirements. Existing techniques typically face an interception policy space that grows exponentially with target count, poor sample efficiency under delayed rewards, and opaque decision processes, making them insufficient for operational needs. To address these challenges, this paper proposes an interception strategy framework based on Explainable Hierarchical Dueling DQN (EHD-DQN). This framework suppresses exponential policy-space growth and shortens the decision chain through a hierarchical decoupling of “upper-level ranking → lower-level interception”. A temporally decayed multi-experience buffers is introduced to improve sample efficiency and convergence stability under delayed rewards. Moreover, an explainability module that combines Gradient-weighted Class Activation Mapping (Grad-CAM) and Local Interpretable Model-agnostic Explanations (LIME) is embedded to inject explanation signals into the training loop and provide traceable decision rationales. Compared with Deep Q-Network (DQN), Deep Deterministic Policy Gradient (DDPG), Proximal Policy Optimization (PPO), and three traditional optimization algorithms—Rolling-Horizon Mixed-Integer Linear Programming (RH-MILP), Non-dominated Sorting Genetic Algorithm Ⅱ (NSGA-Ⅱ), and Adaptive Large Neighborhood Search (ALNS), EHD-DQN achieves superior performance in interception count, ammunition utilization, and engagement timing for high-value targets, while furnishing transparent, staff-oriented justifications for command decision-making. The results indicate that EHD-DQN offers an efficient and explainable decision-making paradigm for AMD command-and-control systems.

Cite this article

Yuheng LIU , Li YANG , Qilong HUANG . Optimizing air and missile defense strategies with explainable hierarchical reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2026 , 47(8) : 332786 -332786 . DOI: 10.7527/S1000-6893.2025.32786

References

[1] 刘伟, 张琳, 王代强, 等. 激光武器反无人机集群作战运用及关键技术[J]. 航空学报202445(12): 329457.
  LIU W, ZHANG L, WANG D Q, et al. Application and key technologies of laser weapons in anti-UAV swarm operations[J]. Acta Aeronautica et Astronautica Sinica202445(12): 329457 (in Chinese).
[2] SUN Z Y, YANG J Y. Multi-missile interception for multi-targets: Dynamic situation assessment, target allocation and cooperative interception in groups[J]. Journal of the Franklin Institute2022359(12): 5991-6022.
[3] LI J R, WU G H, WANG L. A comprehensive survey of weapon target assignment problem: Model, algorithm, and application[J]. Engineering Applications of Artificial Intelligence2024137: 109212.
[4] OH S H, BYUEON G W, CHO Y I, et al. Artificial intelligence in combat decision-making: Weapon target assignment via reinforcement learning and graph neural networks[J]. IEEE Transactions on Cybernetics2025, pp(99):1-13.
[5] TUNCER O, CIRPAN H A. Adaptive fuzzy based threat evaluation method for air and missile defense systems[J]. Information Sciences2023643: 119191.
[6] CHEN L, YANG J, ZHOU Y Z, et al. A rule-based agent for unmanned systems with TDGG and VGD for online air target intention recognition[J]. Drones20248(12): 765.
[7] COSKUN M, TASDEMIR S. Fuzzy logic-based threat assessment application in air defense systems[J]. IEEE Transactions on Aerospace and Electronic Systems202359(3): 2245-2251.
[8] PIRES H B, GUIMAR?ES L N F. Dynamic multi-target three-way threat assessment in the context of air defense[J]. IEEE Access202412: 141397-141413.
[9] PIRES H B, GUIMAR?ES L N F, REBOU?AS S. A multi-target threat assessment method based on objective three-way decision[J]. IEEE Access202513: 681-694.
[10] 刘富樯, 周伦, 刘中阳, 等. 基于三支决策和遗传算法的动态武器目标分配[J]. 兵工学报202546(3): 240281.
  LIU F Q, ZHOU L, LIU Z Y, et al. Dynamic weapon-target assignment based on three-way decision and genetic algorithm[J]. Acta Armamentarii202546(3): 240281 (in Chinese).
[11] 唐明南, 张承龙, 赵强, 等. 任务场景驱动的防空资源部署方案智能生成与优化方法[J]. 现代防御技术202351(3): 1-9.
  TANG M N, ZHANG C L, ZHAO Q, et al. Scenario-driven and intelligent optimization of disposition scheme for air defense[J]. Modern Defense Technology202351(3): 1-9 (in Chinese).
[12] 毕文豪, 周久力, 段晓波, 等. 基于多要素改进NSGA-Ⅱ的小直径制导炸弹空面打击最优火力分配方法[J]. 航空学报202344(17): 328116.
  BI W H, ZHOU J L, DUAN X B, et al. Optimal fire distribution method of small diameter guided bomb in air-to-surface strike based on multi-factor modified NSGA-Ⅱ[J]. Acta Aeronautica et Astronautica Sinica202344(17): 328116 (in Chinese).
[13] SONG J M, CHENG T, WANG Y M, et al. LPI-based resource allocation strategy for multiple targets tracking in CMIMO radar system with array division[J]. Signal Processing2024225: 109625.
[14] BERTSIMAS D, PASKOV A. Solving large-scale weapon target assignment problems in seconds using branch-price-and-cut[J]. Naval Research Logistics (NRL)202572(5): 735-749.
[15] 隆雨佟, 陈爱国, 史红权, 等. 基于改进差分进化算法的跨平台武器目标分配方法[J]. 系统工程与电子技术202446(3): 953-962.
  LONG Y T, CHEN A G, SHI H Q, et al. Cross-platform weapon target allocation method based on improved differential evolution algorithm[J]. Systems Engineering and Electronics202446(3): 953-962 (in Chinese).
[16] YI X J, YU H Y, XU T. Solving multi-objective weapon-target assignment considering reliability by improved MOEA/D-AM2M[J]. Neurocomputing2024563: 126906.
[17] Lu Y, Chen D Z, Gao T. An exact algorithm for the dynamic two-stage weapon-target assignment problem: abstract=4485993[R]. SSRN, 2023.
[18] 孙昕, 邢立宁, 王锐, 等. 基于多目标进化算法的防空导弹武器目标分配[J]. 系统仿真学报202436(6): 1298-1308.
  SUN X, XING L N, WANG R, et al. Air defense missile weapon target assignment based on multi-objective evolutionary algorithm[J]. Journal of System Simulation202436(6): 1298-1308 (in Chinese).
[19] ZHAO J, LV Y F. Output-feedback robust control of systems with uncertain dynamics via data-driven policy learning[J]. International Journal of Robust and Nonlinear Control202232(18): 9791-9807.
[20] 高树一, 林德福, 郑多,等. 考虑拦截器探测能力限制的飞行器智能机动突防制导策略[J]. 航空学报202546(10): 331304.
  GAO S Y, LIN D F, ZHENG D, et al. Intelligent maneuvering penetration guidance strategies for aerial vehicles considering interceptor detection capability limitations[J]. Acta Aeronautica et Astronautica Sinica202546(10): 331304 (in Chinese).
[21] ZHAO M R, WANG G, FU Q, et al. Intelligent decision‐making system of air defense resource allocation via hierarchical reinforcement learning[J]. International Journal of Intelligent Systems20242024(1): 7777050.
[22] LI T, WANG G, FU Q, et al. An intelligent algorithm for solving weapon-target assignment problem: DDPG-DNPE algorithm[J]. Computers, Materials & Continua, 202376(3): 3499-3522.
[23] NA H, AHN J, MOON I C. Weapon-target assignment by reinforcement learning with pointer network[J]. Journal of Aerospace Information Systems202320(1): 53-59.
[24] 闫世祥, 刘海军. 基于深度强化学习的传感器-武器-目标分配方法[J]. 现代防御技术202553(4): 10-17.
  YAN S X, LIU H J. Sensor-weapon-target assignment method based on deep reinforcement learning[J]. Modern Defense Technology202553(4): 10-17 (in Chinese).
[25] QIN P, ZHAO T. Knowledge guided fuzzy deep reinforcement learning[J]. Expert Systems with Applications2025264: 125823.
[26] VOUROS G A. Explainable deep reinforcement learning: state of the art and challenges[J]. ACM Computing Surveys202255(5): 1-39.
[27] 张晨浩, 周焰, 蔡益朝, 等. 空中目标作战意图识别研究综述[J]. 现代防御技术202452(4): 1-15.
  ZHANG C H, ZHOU Y, CAI Y C, et al. A review of air target operational intention recognition research[J]. Modern Defense Technology202452(4): 1-15 (in Chinese).
[28] KIM J E, LEE C H, YI M Y. A study on the weapon–target assignment problem considering heading error[J]. International Journal of Aeronautical and Space Sciences202425(3): 1105-1120.
[29] ZHAO K, SONG J, YU J W, et al. Integrated assignment and guidance with multi-objective function in a three-dimensional scenario[J]. Engineering Optimization2025: 1-16.
[30] WONG A, B?CK T, KONONOVA A V, et al. Deep multiagent reinforcement learning: Challenges and directions[J]. Artificial Intelligence Review202356(6): 5023-5056.
[31] MINH D, WANG H X, LI Y F, et al. Explainable artificial intelligence: A comprehensive review[J]. Artificial Intelligence Review202255: 3503-3568.
[32] GAJCIN J, DUSPARIC I. Redefining counterfactual explanations for reinforcement learning: Overview, challenges and opportunities[J]. ACM Computing Surveys202456(9): 1-33.
[33] RIBEIRO M, SINGH S, GUESTRIN C. “Why should I trust you?” Explaining the predictions of any classifier[C]∥2016 Conference of the north American chapter of the association for computational linguistics: Demonstrations. San Diego: NAACL, 2016: 97-101.
[34] SELVARAJU R R, COGSWELL M, DAS A, et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization[J]. International Journal of Computer Vision2020128: 336-359.
Outlines

/