Electronics and Electrical Engineering and Control

Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm

  • FU Xiaowei ,
  • WANG Hui ,
  • XU Zhe
Expand
  • School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China

Received date: 2021-01-22

  Revised date: 2021-03-06

  Online published: 2021-04-27

Supported by

Aeronautical Science Foundation of China (202023053001)

Abstract

To solve the problem of pursuit-evasion game in multi-UAVs confronting the fast target, we study the cooperative pursuit strategy of multi-UAVs. We train the strategy using the DE composed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG) algorithm, and design two reward functions:global reward function, and local reward function. The trained multi-UAVs can effectively carry out the cooperative pursuit mission. Simulation results show the effectiveness of the proposed method. The multi-UAVs can take advantage of numbers and cooperative work to complete a rounding up of the fast target. It is also verified that the proposed method can achieve faster convergence effect than the basic MADDPG algorithm.

Cite this article

FU Xiaowei , WANG Hui , XU Zhe . Cooperative pursuit strategy for multi-UAVs based on DE-MADDPG algorithm[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022 , 43(5) : 325311 -325311 . DOI: 10.7527/S1000-6893.2021.25311

References

[1] 王祥科, 刘志宏, 丛一睿, 等. 小型固定翼无人机集群综述和未来发展[J]. 航空学报, 2020, 41(4):023732. WANG X K, LIU Z H, CONG Y R,et al. Miniature fixed-wing UAV swarms:Review and outlook[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(4):023732(in Chinese).
[2] 周浦城, 洪炳镕. 基于对策论的群机器人追捕-逃跑问题研究[J]. 哈尔滨工业大学学报, 2003, 35(9):1056-1059. ZHOU P C, HONG B R. Grouprobot pursuit-evasion problem based on game theory[J]. Journal of Harbin Institute of Technology, 2003, 35(9):1056-1059(in Chinese).
[3] 周浦城, 洪炳镕, 王月海. 动态环境下多机器人合作追捕研究[J]. 机器人, 2005, 27(4):289-295, 300. ZHOU P C, HONG B R, WANG Y H. Multi-robot cooperative pursuit under dynamic environment[J]. Robot, 2005, 27(4):289-295, 300(in Chinese).
[4] 方宝富, 潘启树, 洪炳镕, 等. 多追捕者-单-逃跑者追逃问题实现成功捕获的约束条件[J]. 机器人, 2012, 34(3):282-291. FANG B F, PAN Q S, HONG B R, et al. Constraintconditions of successful capture in multi-pursuers vs one-evader games[J]. Robot, 2012, 34(3):282-291(in Chinese).
[5] 崔一鸣. 多机器人协作的关键技术研究[D]. 南京:南京理工大学, 2008. CUI Y M. Key technologies of multi-robot coordination and cooperation[D]. Nanjing:Nanjing University of Science and Technology, 2008(in Chinese).
[6] 熊伟. 多自主水下机器人目标搜索与协同围捕研究[D]. 哈尔滨:哈尔滨工程大学, 2008. XIONG W. Research on target searching and cooperative hunting for autonomous underwater vehicles[D]. Harbin:Harbin Engineering University, 2008(in Chinese).
[7] 方宝富. 多机器人追捕关键技术研究[D]. 哈尔滨:哈尔滨工业大学, 2013. FANG B F. Research on key technologies of multi robot pursuit[D]. Harbin:Harbin Institute of Technology, 2013(in Chinese).
[8] 陈灿, 莫雳, 郑多, 等. 非对称机动能力多无人机智能协同攻防对抗[J]. 航空学报, 2020, 41(12):324152. CHEN C, MO L, ZHENG D,et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(12):324152(in Chinese).
[9] LIUBARSHCHUK I, ALTHÖFER I. The problem of approach in differential-difference games[J]. International Journal of Game Theory, 2016, 45(3):511-522.
[10] EGOROV M. Multi-agent deep reinforcement learning[EB/OL]. http://cs231n.stanford.edu/reports/2016/pdfs/122_Report.pdf.2016.
[11] 孙长银, 穆朝絮. 多智能体深度强化学习的若干关键科学问题[J]. 自动化学报, 2020, 46(7):1301-1312. SUN C Y, MUC X. Important scientific problems of multi-agent deep reinforcement learning[J]. Acta Automatica Sinica, 2020, 46(7):1301-1312(in Chinese).
[12] 孙彧, 曹雷, 陈希亮, 等. 多智能体深度强化学习研究综述[J]. 计算机工程与应用, 2020, 56(5):13-24. SUN Y, CAO L, CHEN X L, et al. Overview ofmulti-agent deep reinforcement learning[J]. Computer Engineering and Applications, 2020, 56(5):13-24(in Chinese).
[13] 陈亮, 梁宸, 张景异, 等. Actor-Critic框架下一种基于改进DDPG的多智能体强化学习算法[J]. 控制与决策, 2021, 36(1):75-82. CHEN L, LIANG C, ZHANG J Y, et al. A multi-agent reinforcement learning algorithm based on improved DDPG in Actor-Critic framework[J]. Control and Decision, 2021, 36(1):75-82(in Chinese).
[14] 杜威, 丁世飞. 多智能体强化学习综述[J]. 计算机科学, 2019, 46(8):1-8. DU W, DING S F. Overview onmulti-agent reinforcement learning[J]. Computer Science, 2019, 46(8):1-8(in Chinese).
[15] 高昂, 董志明, 李亮, 等. MADDPG算法并行优先经验回放机制[J]. 系统工程与电子技术, 2021, 43(2):420-433. GAO A, DONG Z M, LI L, et al. Parallel priority experience replay mechanism of MADDPG algorithm[J]. Systems Engineering and Electronics, 2021, 43(2):420-433(in Chinese).
[16] 舒扬. 多智能体协同控制关键算法研究与应用[D]. 成都:电子科技大学, 2019. SHU Y. Research and application of algorithms for multi-agent cooperative control[D]. Chengdu:University of Electronic Science and Technology of China, 2019(in Chinese).
[17] 王桂鸿. 合作型多智能体中的深度强化学习研究[D]. 广州:华南理工大学, 2019. WANG G H. Research on deep reinforcement learning in cooperative multi-agent system[D]. Guangzhou:South China University of Technology, 2019(in Chinese).
[18] LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments[DB/OL]. arXiv pre-print:1706.2275, 2017.
[19] 桂熙. 基于MADDPG算法的多智能体协同控制研究[D]. 武汉:武汉纺织大学, 2020. GUI X. Research on multi-agent cooperative control based on MADDPG algorithm[D]. Wuhan:Wuhan Textile University, 2020(in Chinese).
[20] 何明, 张斌, 柳强, 等. MADDPG算法经验优先抽取机制[J]. 控制与决策, 2021, 36(1):68-74. HE M, ZHANG B, LIU Q, et al. Multi-agent deep deterministic policy gradient algorithm vi a priori tized experience selected method[J]. Control and Decision, 2021, 36(1):68-74(in Chinese).
[21] SHEIKH H U,BÖLÖNI L. Multi-agent reinforcement learning for problems with combined individual and team reward[C]//2020 International Joint Conference on Neural Networks (IJCNN), 2020:1-8.
[22] YANG J, NAKHAEI A, ISELE D, et al. CM3:cooperative mul-ti-goal multi-stage multi-agent reinforcement[EB/OL]. arXiv pre-print arXiv:1809.05188, 2018.
[23] SHEIKH H U,BÖLÖNI L. Designing a multi-objective reward function for creating teams of robotic bodyguards using deep reinforcement learning[C]//35th International Conference on Maching Learning, 2019.
[24] 张耀中, 许佳林, 姚康佳, 等. 基于DDPG算法的无人机集群追击任务[J]. 航空学报, 2020, 41(10):324000. ZHANG Y Z, XU J L, YAO K J,et al. Pursuit missions for UAV swarms based on DDPG algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(10):324000(in Chinese).
[25] WANG Y D, DONG L, SUN C Y. Cooperative control for multi-player pursuit-evasion games with reinforcement learning[J]. Neurocomputing, 2020, 412:101-114.
[26] 马俊冲. 基于多机器人系统的多目标围捕协同控制问题研究[D]. 长沙:国防科技大学, 2018. MA J C. Research on encirclement control for A group of targets by multi-robot system[D]. Changsha:National University of Defense Technology, 2018(in Chinese).
[27] ZHU J G, ZOU W, ZHU Z. Learningevasion strategy in pursuit-evasion by deep Q-network[C]//201824th International Conference on Pattern Recognition (ICPR). Piscataway:IEEE Press, 2018:67-72.
Outlines

/