首页 >

基于时序记忆MADDPG的无人机集群拦截策略

刘武君1,赵慧珍1,李龙跃2,曹波1,吴凤广1,李丹3   

  1. 1. 空军工程大学
    2. 陕西省西安市空军工程大学防空反导学院研二队
    3. 中国人民解放军93861部队
  • 收稿日期:2026-01-14 修回日期:2026-03-24 出版日期:2026-03-30 发布日期:2026-03-30
  • 通讯作者: 赵慧珍
  • 基金资助:
    国家自然科学基金

UAV Swarm Interception Strategy Based on Temporal Memory MADDPG

  • Received:2026-01-14 Revised:2026-03-24 Online:2026-03-30 Published:2026-03-30

摘要: 针对多智能体深度确定性策略梯度(Multi-Agent Deep Deterministic Policy Gradient, MADDPG)算法在无人机集群拦截对抗中忽视历史时序信息,仅依赖当前观测决策,导致决策短视、协同拦截效能不足的问题,提出一种结合长短期记忆(Long Short-Term Memory,LSTM)的时序记忆多智能体深度确定性策略梯度算法(Temporal Memory Multi-Agent Deep Deterministic Policy Gradient,TM-MADDPG)。首先,通过将LSTM引入演员网络,使智能体能够基于历史观测序列提取态势演化特征与目标运动趋势,输出具有前瞻性的协同拦截动作;其次,设计序列经验回放机制与缓冲区,适配时序决策的输入需求,并结合密集距离奖励、稀疏任务奖励与行为惩罚构建复合奖励函数,引导无人机集群高效学习协同拦截策略。最后,设置蓝方采用智能机动策略,通过不同对抗场景进行实验,与MADDPG、MW-MADDPG算法对比,结果表明,TM-MADDPG算法在动态对抗环境下具有的更好有效性与鲁棒性。

关键词: 无人机集群, MADDPG, 长短期记忆网络, 拦截策略

Abstract: To address the shortcomings of multi-agent deep deterministic policy gradient algorithms in drone swarm interception scenar-ios—namely, their disregard for historical temporal information and reliance solely on current observations, leading to deci-sion-making myopia and suboptimal collaborative interception performance—this study proposes a Temporal Memory-enhanced Multi-Agent Deep Deterministic Policy Gradient (TM-MADDPG) algorithm integrating long-term and short-term memory. First, integrating LSTM into the actor network enables agents to extract situational evolution features and target mo-tion trends from historical observation sequences, generating forward-looking cooperative interception actions. Second, a sequence experience replay mechanism with buffering is designed to adapt to temporal decision input requirements. A com-posite reward function combining dense distance rewards, sparse task rewards, and behavioral penalties guides the drone swarm to efficiently learn cooperative interception strategies. Blue forces employ intelligent maneuvering strategies. Experi-ments across diverse adversarial scenarios demonstrate that compared to MADDPG and MW-MADDPG, the TM-MADDPG algorithm exhibits superior effectiveness and robustness in dynamic adversarial environments.

Key words: UAV swarms, MADDPG, Long Short-Term Memory Network, Interception strategy

中图分类号: