导航

Acta Aeronautica et Astronautica Sinica

Previous Articles     Next Articles

UAV Swarm Interception Strategy Based on Temporal Memory MADDPG

  

  • Received:2026-01-14 Revised:2026-03-24 Online:2026-03-30 Published:2026-03-30

Abstract: To address the shortcomings of multi-agent deep deterministic policy gradient algorithms in drone swarm interception scenar-ios—namely, their disregard for historical temporal information and reliance solely on current observations, leading to deci-sion-making myopia and suboptimal collaborative interception performance—this study proposes a Temporal Memory-enhanced Multi-Agent Deep Deterministic Policy Gradient (TM-MADDPG) algorithm integrating long-term and short-term memory. First, integrating LSTM into the actor network enables agents to extract situational evolution features and target mo-tion trends from historical observation sequences, generating forward-looking cooperative interception actions. Second, a sequence experience replay mechanism with buffering is designed to adapt to temporal decision input requirements. A com-posite reward function combining dense distance rewards, sparse task rewards, and behavioral penalties guides the drone swarm to efficiently learn cooperative interception strategies. Blue forces employ intelligent maneuvering strategies. Experi-ments across diverse adversarial scenarios demonstrate that compared to MADDPG and MW-MADDPG, the TM-MADDPG algorithm exhibits superior effectiveness and robustness in dynamic adversarial environments.

Key words: UAV swarms, MADDPG, Long Short-Term Memory Network, Interception strategy

CLC Number: