基于时序记忆MADDPG的无人机集群拦截策略

doi:10.7527/S1000-6893.2026.33366

Abstract

Abstract: To address the shortcomings of multi-agent deep deterministic policy gradient algorithms in drone swarm interception scenar-ios—namely, their disregard for historical temporal information and reliance solely on current observations, leading to deci-sion-making myopia and suboptimal collaborative interception performance—this study proposes a Temporal Memory-enhanced Multi-Agent Deep Deterministic Policy Gradient (TM-MADDPG) algorithm integrating long-term and short-term memory. First, integrating LSTM into the actor network enables agents to extract situational evolution features and target mo-tion trends from historical observation sequences, generating forward-looking cooperative interception actions. Second, a sequence experience replay mechanism with buffering is designed to adapt to temporal decision input requirements. A com-posite reward function combining dense distance rewards, sparse task rewards, and behavioral penalties guides the drone swarm to efficiently learn cooperative interception strategies. Blue forces employ intelligent maneuvering strategies. Experi-ments across diverse adversarial scenarios demonstrate that compared to MADDPG and MW-MADDPG, the TM-MADDPG algorithm exhibits superior effectiveness and robustness in dynamic adversarial environments.

Key words: UAV swarms, MADDPG, Long Short-Term Memory Network, Interception strategy

CLC Number:

V279

References

[1] WANG X J, ZHAO Z H, YI L, et al. A Survey on Se-curity of UAV Swarm Networks: Attacks and Coun-termeasures[J]. ACM Computing Surveys, 2025, 57(3): 1-37.
[2] WANG C, C WU A, HOU Y Q, et al. Optimal de-ployment of swarm positions in cooperative intercep-tion of multiple UAV swarms[J]. Digital Communica-tions and Networks, 2023, 9(2): 567-579.
[3] 刘雷, 刘大卫, 王晓光, 等. 无人机集群与反无人机集群发展现状及展望[J].航空学报,2022,43(S1): 726908.
LIU L, LIU D W, WANG X G, et al. Development sta-tus and outlook of UAV clusters and anti-UAV clus-ters[J]. Acta Aeronautica et Astronautica Sini-ca,2022,43
(S1):726908(in Chinese).
[4] LOWE R, WU Y, TAMAR A, et al. Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environ-ments [C]//Proceedings of the 31th International Con? ference on Neural Information Processing Sys-tems,. New York, USA, 2017: 6382-6393.
[5] 闫超, 相晓嘉, 徐昕, 等. 多智能体深度强化学习及其可扩展性与可迁移性研究综述[J]. 控制与决策, 2022, 37(12): 3083-3102.
YAN C, XIANG X J, XU X, et al. A survey on scala-bility and transferability of multi-agent deep rein-forcement learning[J]. Control and Decision, 2022, 37(12): 3083-3102(in Chinese).
[6] 高树一, 林德福, 郑多, 等. 针对集群攻击的飞行器智能协同拦截策略[J]. 航空学报, 2023, 44(18): 328301.
GAO S Y, LIN D F, ZHENG D, et al. Intelligent coop-erative interception strategy of aircraft against cluster attack[J]. Acta Aeronautica et Astronautica Sini-ca,2023,44(18):328301(in Chinese).
[7] WANG Z, LIU F, GUO J, et al. UAV Swarm Confron-tation Based on Multi-agent Deep Reinforcement Learning[C]//2022 41st Chinese Control Conference (CCC). Hefei, China: IEEE, 2022: 4996-5001.
[8] LIU B, WANG S L, LI Q H, et al. Task Assignment of UAV Swarms Based on Deep Reinforcement Learn-ing[J]. Drones, 2023, 7(5): 297.
[9] 高甲博, 肖玮, 何智杰. P3C-MADDPG算法的多无人机协同追捕对抗策略研究[J]. 指挥控制与仿真, 2023, 45(6): 7-18.
GAO J B, XIAO W, HE Z J. Research on multi?UAV cooperative pursuit and confrontation strategy based on P3C?MADDPG algorithm[J]. Command Control and Simulation， 2023, 45(6): 7-18 (in Chinese).
[10] ZHAO M R, WANG G, FU Q, et al. MW-MADDPG: a meta-learning based decision-making method for col-laborative UAV swarm[J]. Frontiers in Neurorobotics, 2023, 17: 1243174.
[11] 符小卫, 王辛夷, 乔哲. 基于APIQ算法的多无人机攻防对抗策略[J]. 系统工程与电子技术, 2025, 47(7): 2205-2215.
FU X W, WANG X Y, QIAO Z. Confront strategy of multi-unmanned aerial vehicle based on ASDDPG al-gorithm[J]. Systems Engineering and Electronics,2025, 47(7): 2205-2215 (in Chinese).
[12] 符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略[J]. 航空学报, 2022, 43(5): 325311.
FU X W, WANG H, XU Z. Cooperative pursuit strate-gy for multi-UAVs based on DE-MADDPG algo-rithm[J]. Acta Aeronautica et Astronautica Sinica, 2022,43(5): 325311 (in Chinese).
[13] LIU W, ZHANG L, WANG W F, et al. Dynamic Re-source Target Assignment Problem for Laser Sys-tems’ Defense Against Malicious UAV Swarms Based on MADDPG-IA[J]. Aerospace, 2025, 12(8): 729.
[14] CAI H, LI X S, ZHANG Y B, et al. Interception of a Single Intruding Unmanned Aerial Vehicle by Multi-ple Missiles Using the Novel EA-MADDPG Training Algorithm[J]. Drones, 2024, 8(10): 524.
[15] AI L, TANG S Z, YU J. Multi-agent cooperative en-circlement based on improved MADDPG algorithm[J]. Journal of Physics: Conference Series, 2024, 2898(1): 012033.
[16] WEI Z T, WEI R X. UAV Swarm Rounding Strategy Based on Deep Reinforcement Learning Goal Con-sistency with Multi-Head Soft Attention Algo-rithm[J]. Drones, 2024, 8(12): 731.
[17] WAN K F, WU D W, ZHAI Y W, et al. An Improved Approach towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning[J]. Entropy, 2021, 23(11): 1433.
[18] 高显忠, 项磊, 王宝来, 等. 针对无人机集群对抗的规则与智能耦合约束训练方法[J]. 国防科技大学学报, 2023, 45(1): 157-166.
GAO X Z, XIANG L, WANG B L, et al. Rule and intelligence coupling constraint training method for UAV swarm confrontation[J]. Journal of National University of Defense Technology,2023,45(1):157-166(in Chinese).
[19] YANG J F, YANG X W, YU T Q. Multi-Unmanned Aerial Vehicle Confrontation in Intelligent Air Combat: A Multi-Agent Deep Reinforcement Learning Ap-proach[J]. Drones, 2024, 8(8): 382.
[20] YANG C F, ZHANG B, ZHANG M, et al. Research on Decision-Making Strategies for Multi-Agent UAVs in Island Missions Based on Rainbow Fusion MADDPG Algorithm[J]. Drones, 2025, 9(10): 673.
[21] WANG F, ZHU X P, ZHOU Z. MADDPG-Based Mul-ti-UAV Autonomous Collaborative Attack in Confron-tation Scenarios[C]//Proceedings of the 2024 3rd Asia Conference on Algorithms, Computing and Machine Learning. Shanghai China: ACM, 2024: 271-276.
[22] KONG W R, ZHOU D Y, YANG Z, et al. Maneuver Strategy Generation of UCAV for within Visual Range Air Combat Based on Multi-Agent Reinforcement Learning and Target Position Prediction[J]. Applied Sciences, 2020, 10(15): 5198.
[23] XIAO Z F, LIU F Y, WANG Q. Optimization of Multi-Intelligent Body Strategies for UAV Adversarial Tasks Based on MADDPG-SASP[J]. Information, 2025, 16(12): 1050.
[24] ZHANG J D, GUO Y K, ZHENG L H, et al. Real-Time UAV Path Planning Based on LSTM Network[J]. Journal of Systems Engineering and Electronics, 2024, 35(2): 374-385.
[25] ZHANG X J, GUO H, YAN T, et al. Penetration Strat-egy for High-Speed Unmanned Aerial Vehicles: A Memory-Based Deep Reinforcement Learning Ap-proach[J]. Drones, 2024, 8(7): 275.
[26] 王昱, 关智慧, 李远鹏. 基于轨迹预测和分布式MADDPG的无人机集群追击决策[J]. 计算机应用, 2024, 44(11): 3623-3628.
WANG Y, GUAN Z H, LI Y P. Distributed UAV clus-ter pursuit decision-making based on trajectory pre-diction and MADDPG[J]. Journal of Computer Ap-plications,2024, 44(11): 3623-3628 (in Chinese).
[27] JIA D S, KUA J, LIU X. A Lightweight LSTM Model for Flight Trajectory Prediction in Autonomous UAVs[J]. Future Internet, 2025, 18(1): 4. 2024, 44(11): 3623-3628.
[28] CHEN J Y, WANG X H, CHEN X. Track Correlation Algorithm Based on CNN-LSTM for Swarm Tar-gets[J]. Journal of Systems Engineering and Electron-ics, 2024, 35(2): 417-429.
[29] WEI X L, YANG L F, CAO G, et al. Recurrent MADDPG for Object Detection and Assignment in Combat Tasks[J]. IEEE Access, 2020, 8: 163334-163343.
[30] ZHAO E Y, ZHOU N, LIU C J, et al. Time-aware MADDPG with LSTM for multi-agent obstacle avoid-ance: a comparative study[J]. Complex & Intelligent Systems, 2024, 10(3): 4141-4155.
[31] ZHANG T T, CHEN Y, DONG R Z, et al. Autono-mous decision-making of UAV cluster with commu-nication constraints based on reinforcement learn-ing[J]. Journal of Cloud Computing, 2025, 14(1): 12.
[32] Silver D, Lever G, Heess N, et al. Deterministic policy gradient algorithms[C]. International Conference on Machine Learning. Piscataway: IEEE, 2014: 387-395.

UAV Swarm Interception Strategy Based on Temporal Memory MADDPG

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 6

Recommended Articles

Metrics

Comments

[1]	Guixian QU, Dongyang LIU, Xu YANG, Tian QIU, Chuankai LIU, Shuiting DING, Shuzheng YUAN, Kan GUO. Remaining useful life prediction method based on temporal information enhancement of sensors [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(17): 231634-231634.
[2]	Wentao LI, Feng FANG, Zhenya WANG, Yichao ZHU, Dongliang PENG. Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(17): 529460-529460.
[3]	FU Xiaowei, WANG Hui, XU Zhe. Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(5): 325311-325311.
[4]	TANG Shuaiwen, ZHOU Zhijie, JIANG Jiang, CAO You, CHEN Yuan, YE Yanqing. Consensus evaluation of UAV swarm cooperative situation awareness considering perturbation [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(S2): 724233-724233.
[5]	WANG Xiangke, LIU Zhihong, CONG Yirui, LI Jie, CHEN Hao. Miniature fixed-wing UAV swarms: Review and outlook [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(4): 23732-023732.
[6]	ZHANG Yaozhong, XU Jialin, YAO Kangjia, LIU Jieling. Pursuit missions for UAV swarms based on DDPG algorithm [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020, 41(10): 324000-324000.