基于时序记忆MADDPG的无人机集群拦截策略

doi:10.7527/S1000-6893.2026.33366

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于时序记忆MADDPG的无人机集群拦截策略

刘武君¹,赵慧珍¹,李龙跃²,曹波¹,吴凤广¹,李丹³

1. 空军工程大学
2. 陕西省西安市空军工程大学防空反导学院研二队
3. 中国人民解放军93861部队

收稿日期:2026-01-14 修回日期:2026-03-24 出版日期:2026-03-30 发布日期:2026-03-30
通讯作者: 赵慧珍
基金资助:
国家自然科学基金

UAV Swarm Interception Strategy Based on Temporal Memory MADDPG

Received:2026-01-14 Revised:2026-03-24 Online:2026-03-30 Published:2026-03-30

摘要/Abstract

摘要： 针对多智能体深度确定性策略梯度（Multi-Agent Deep Deterministic Policy Gradient, MADDPG）算法在无人机集群拦截对抗中忽视历史时序信息，仅依赖当前观测决策，导致决策短视、协同拦截效能不足的问题，提出一种结合长短期记忆（Long Short-Term Memory，LSTM）的时序记忆多智能体深度确定性策略梯度算法（Temporal Memory Multi-Agent Deep Deterministic Policy Gradient，TM-MADDPG）。首先，通过将LSTM引入演员网络，使智能体能够基于历史观测序列提取态势演化特征与目标运动趋势，输出具有前瞻性的协同拦截动作；其次，设计序列经验回放机制与缓冲区，适配时序决策的输入需求，并结合密集距离奖励、稀疏任务奖励与行为惩罚构建复合奖励函数，引导无人机集群高效学习协同拦截策略。最后，设置蓝方采用智能机动策略，通过不同对抗场景进行实验，与MADDPG、MW-MADDPG算法对比，结果表明，TM-MADDPG算法在动态对抗环境下具有的更好有效性与鲁棒性。

关键词: 无人机集群, MADDPG, 长短期记忆网络, 拦截策略

Abstract: To address the shortcomings of multi-agent deep deterministic policy gradient algorithms in drone swarm interception scenar-ios—namely, their disregard for historical temporal information and reliance solely on current observations, leading to deci-sion-making myopia and suboptimal collaborative interception performance—this study proposes a Temporal Memory-enhanced Multi-Agent Deep Deterministic Policy Gradient (TM-MADDPG) algorithm integrating long-term and short-term memory. First, integrating LSTM into the actor network enables agents to extract situational evolution features and target mo-tion trends from historical observation sequences, generating forward-looking cooperative interception actions. Second, a sequence experience replay mechanism with buffering is designed to adapt to temporal decision input requirements. A com-posite reward function combining dense distance rewards, sparse task rewards, and behavioral penalties guides the drone swarm to efficiently learn cooperative interception strategies. Blue forces employ intelligent maneuvering strategies. Experi-ments across diverse adversarial scenarios demonstrate that compared to MADDPG and MW-MADDPG, the TM-MADDPG algorithm exhibits superior effectiveness and robustness in dynamic adversarial environments.

Key words: UAV swarms, MADDPG, Long Short-Term Memory Network, Interception strategy

中图分类号:

V279

刘武君赵慧珍李龙跃曹波吴凤广李丹. 基于时序记忆MADDPG的无人机集群拦截策略[J]. 航空学报, doi: 10.7527/S1000-6893.2026.33366.

参考文献

[1] WANG X J, ZHAO Z H, YI L, et al. A Survey on Se-curity of UAV Swarm Networks: Attacks and Coun-termeasures[J]. ACM Computing Surveys, 2025, 57(3): 1-37.
[2] WANG C, C WU A, HOU Y Q, et al. Optimal de-ployment of swarm positions in cooperative intercep-tion of multiple UAV swarms[J]. Digital Communica-tions and Networks, 2023, 9(2): 567-579.
[3] 刘雷, 刘大卫, 王晓光, 等. 无人机集群与反无人机集群发展现状及展望[J].航空学报,2022,43(S1): 726908.
LIU L, LIU D W, WANG X G, et al. Development sta-tus and outlook of UAV clusters and anti-UAV clus-ters[J]. Acta Aeronautica et Astronautica Sini-ca,2022,43
(S1):726908(in Chinese).
[4] LOWE R, WU Y, TAMAR A, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environ-ments [C]//Proceedings of the 31th International Con? ference on Neural Information Processing Sys-tems,. New York, USA, 2017: 6382-6393.
[5] 闫超, 相晓嘉, 徐昕, 等. 多智能体深度强化学习及其可扩展性与可迁移性研究综述[J]. 控制与决策, 2022, 37(12): 3083-3102.
YAN C, XIANG X J, XU X, et al. A survey on scala-bility and transferability of multi-agent deep rein-forcement learning[J]. Control and Decision, 2022, 37(12): 3083-3102(in Chinese).
[6] 高树一, 林德福, 郑多, 等. 针对集群攻击的飞行器智能协同拦截策略[J]. 航空学报, 2023, 44(18): 328301.
GAO S Y, LIN D F, ZHENG D, et al. Intelligent coop-erative interception strategy of aircraft against cluster attack[J]. Acta Aeronautica et Astronautica Sini-ca,2023,44(18):328301(in Chinese).
[7] WANG Z, LIU F, GUO J, et al. UAV Swarm Confron-tation Based on Multi-agent Deep Reinforcement Learning[C]//2022 41st Chinese Control Conference (CCC). Hefei, China: IEEE, 2022: 4996-5001.
[8] LIU B, WANG S L, LI Q H, et al. Task Assignment of UAV Swarms Based on Deep Reinforcement Learn-ing[J]. Drones, 2023, 7(5): 297.
[9] 高甲博, 肖玮, 何智杰. P3C-MADDPG算法的多无人机协同追捕对抗策略研究[J]. 指挥控制与仿真, 2023, 45(6): 7-18.
GAO J B, XIAO W, HE Z J. Research on multi?UAV cooperative pursuit and confrontation strategy based on P3C?MADDPG algorithm[J]. Command Control and Simulation， 2023, 45(6): 7-18 (in Chinese).
[10] ZHAO M R, WANG G, FU Q, et al. MW-MADDPG: a meta-learning based decision-making method for col-laborative UAV swarm[J]. Frontiers in Neurorobotics, 2023, 17: 1243174.
[11] 符小卫, 王辛夷, 乔哲. 基于APIQ算法的多无人机攻防对抗策略[J]. 系统工程与电子技术, 2025, 47(7): 2205-2215.
FU X W, WANG X Y, QIAO Z. Confront strategy of multi-unmanned aerial vehicle based on ASDDPG al-gorithm[J]. Systems Engineering and Electronics,2025, 47(7): 2205-2215 (in Chinese).
[12] 符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略[J]. 航空学报, 2022, 43(5): 325311.
FU X W, WANG H, XU Z. Cooperative pursuit strate-gy for multi-UAVs based on DE-MADDPG algo-rithm[J]. Acta Aeronautica et Astronautica Sinica, 2022,43(5): 325311 (in Chinese).
[13] LIU W, ZHANG L, WANG W F, et al. Dynamic Re-source Target Assignment Problem for Laser Sys-tems’ Defense Against Malicious UAV Swarms Based on MADDPG-IA[J]. Aerospace, 2025, 12(8): 729.
[14] CAI H, LI X S, ZHANG Y B, et al. Interception of a Single Intruding Unmanned Aerial Vehicle by Multi-ple Missiles Using the Novel EA-MADDPG Training Algorithm[J]. Drones, 2024, 8(10): 524.
[15] AI L, TANG S Z, YU J. Multi-agent cooperative en-circlement based on improved MADDPG algorithm[J]. Journal of Physics: Conference Series, 2024, 2898(1): 012033.
[16] WEI Z T, WEI R X. UAV Swarm Rounding Strategy Based on Deep Reinforcement Learning Goal Con-sistency with Multi-Head Soft Attention Algo-rithm[J]. Drones, 2024, 8(12): 731.
[17] WAN K F, WU D W, ZHAI Y W, et al. An Improved Approach towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning[J]. Entropy, 2021, 23(11): 1433.
[18] 高显忠, 项磊, 王宝来, 等. 针对无人机集群对抗的规则与智能耦合约束训练方法[J]. 国防科技大学学报, 2023, 45(1): 157-166.
GAO X Z, XIANG L, WANG B L, et al. Rule and intelligence coupling constraint training method for UAV swarm confrontation[J]. Journal of National University of Defense Technology,2023,45(1):157-166(in Chinese).
[19] YANG J F, YANG X W, YU T Q. Multi-Unmanned Aerial Vehicle Confrontation in Intelligent Air Combat: A Multi-Agent Deep Reinforcement Learning Ap-proach[J]. Drones, 2024, 8(8): 382.
[20] YANG C F, ZHANG B, ZHANG M, et al. Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm[J]. Drones, 2025, 9(10): 673.
[21] WANG F, ZHU X P, ZHOU Z. MADDPG-Based Mul-ti-UAV Autonomous Collaborative Attack in Confron-tation Scenarios[C]//Proceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning. Shanghai China: ACM, 2024: 271-276.
[22] KONG W R, ZHOU D Y, YANG Z, et al. Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction[J]. Applied Sciences, 2020, 10(15): 5198.
[23] XIAO Z F, LIU F Y, WANG Q. Optimization of Multi-Intelligent Body Strategies for UAV Adversarial Tasks Based on MADDPG-SASP[J]. Information, 2025, 16(12): 1050.
[24] ZHANG J D, GUO Y K, ZHENG L H, et al. Real-Time UAV Path Planning Based on LSTM Network[J]. Journal of Systems Engineering and Electronics, 2024, 35(2): 374-385.
[25] ZHANG X J, GUO H, YAN T, et al. Penetration Strat-egy for High-Speed Unmanned Aerial Vehicles: A Memory-Based Deep Reinforcement Learning Ap-proach[J]. Drones, 2024, 8(7): 275.
[26] 王昱, 关智慧, 李远鹏. 基于轨迹预测和分布式MADDPG的无人机集群追击决策[J]. 计算机应用, 2024, 44(11): 3623-3628.
WANG Y, GUAN Z H, LI Y P. Distributed UAV clus-ter pursuit decision-making based on trajectory pre-diction and MADDPG[J]. Journal of Computer Ap-plications,2024, 44(11): 3623-3628 (in Chinese).
[27] JIA D S, KUA J, LIU X. A Lightweight LSTM Model for Flight Trajectory Prediction in Autonomous UAVs[J]. Future Internet, 2025, 18(1): 4. 2024, 44(11): 3623-3628.
[28] CHEN J Y, WANG X H, CHEN X. Track Correlation Algorithm Based on CNN-LSTM for Swarm Tar-gets[J]. Journal of Systems Engineering and Electron-ics, 2024, 35(2): 417-429.
[29] WEI X L, YANG L F, CAO G, et al. Recurrent MADDPG for Object Detection and Assignment in Combat Tasks[J]. IEEE Access, 2020, 8: 163334-163343.
[30] ZHAO E Y, ZHOU N, LIU C J, et al. Time-aware MADDPG with LSTM for multi-agent obstacle avoid-ance: a comparative study[J]. Complex & Intelligent Systems, 2024, 10(3): 4141-4155.
[31] ZHANG T T, CHEN Y, DONG R Z, et al. Autono-mous decision-making of UAV cluster with commu-nication constraints based on reinforcement learn-ing[J]. Journal of Cloud Computing, 2025, 14(1): 12.
[32] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms[C]. International Conference on Machine Learning. Piscataway: IEEE, 2014: 387-395.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

基于时序记忆MADDPG的无人机集群拦截策略

UAV Swarm Interception Strategy Based on Temporal Memory MADDPG

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	王浩宇, 张泽旭, 闻单, 刘金龙, 朱倍孝, 包为民. 基于时序耦合分析的无人机集群任务分配方法[J]. 航空学报, 2026, 47(2): 332075-332075.
[2]	胡碧宸, 胡亮亮, 刘玉玺, 谭述君. 离线学习与在线修正的运载火箭气动参数辨识[J]. 航空学报, 2025, 46(S1): 732407-732407.
[3]	焦阳, 陆雨婷, 许拔, 欧阳键. 基于时空相关性的卫星流量迁移预测[J]. 航空学报, 2025, 46(4): 330938-330938.
[4]	赵良瑾, 仝昊楠, 苑子杨, 李昀镀, 张晓典, 成培瑞. 无人机集群的干扰管理：机理、技术与挑战[J]. 航空学报, 2025, 46(23): 632022-632022.
[5]	曲桂娴, 刘冬阳, 杨旭, 邱天, 刘传凯, 丁水汀, 袁树峥, 郭侃. 基于传感器时序信息增强的剩余寿命预测方法[J]. 航空学报, 2025, 46(17): 231634-231634.
[6]	李坤, 布树辉, 李佳朋, 王俱博玺, 韩鹏程, 李霄翰, 李浩玮. 基于单目视觉与测距信息的无人机集群定位方法[J]. 航空学报, 2025, 46(11): 531281-531281.
[7]	郭廷宇, 闫溟, 解春雷. 聚合-分体飞行器气动特性[J]. 航空学报, 2024, 45(S1): 730596-730596.
[8]	王传云, 苏阳, 王琳霖, 王田, 王静静, 高骞. 面向反制无人机集群的多目标连续鲁棒跟踪算法[J]. 航空学报, 2024, 45(7): 329017-329017.
[9]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[10]	何明, 陈浩天, 韩伟, 邓成, 段海滨. 无人机仿鸟群协同控制发展现状及关键技术[J]. 航空学报, 2024, 45(20): 29946-029946.
[11]	於志文, 孙卓, 程岳, 郭斌. 智能无人机集群协同感知计算研究综述[J]. 航空学报, 2024, 45(20): 630912-630912.
[12]	李文韬, 方峰, 王振亚, 朱奕超, 彭冬亮. 引入混合超网络改进MADDPG的双机编队空战自主机动决策[J]. 航空学报, 2024, 45(17): 529460-529460.
[13]	刘伟, 张琳, 王代强, 孟宪良, 张搏. 激光武器反无人机集群作战运用及关键技术[J]. 航空学报, 2024, 45(12): 329457-329457.
[14]	文超, 董文瀚, 解武杰, 蔡鸣, 刘日. 基于回访机制的无人机集群分布式协同区域搜索方法[J]. 航空学报, 2023, 44(11): 327561-327561.
[15]	苏翎菲, 化永朝, 董希旺, 任章. 人与无人机集群多模态智能交互方法[J]. 航空学报, 2022, 43(S1): 727001-727001.