电子电气工程与控制

非对称机动能力多无人机智能协同攻防对抗

  • 陈灿 ,
  • 莫雳 ,
  • 郑多 ,
  • 程子恒 ,
  • 林德福
展开
  • 1. 北京理工大学 宇航学院, 北京 100081;
    2. 北京理工大学 无人机自主控制技术北京市重点实验室, 北京 100081

收稿日期: 2020-04-29

  修回日期: 2020-05-22

  网络出版日期: 2020-06-24

基金资助

国家自然科学基金(61903350);国家自然科学基金重点项目(U1613225)

Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability

  • CHEN Can ,
  • MO Li ,
  • ZHENG Duo ,
  • CHENG Ziheng ,
  • LIN Defu
Expand
  • 1. School of Aerospace Engineering, Beijing Institute of Technology, Beijing 100081, China;
    2. Beijing Key Laboratory of UAV Autonomous Control, Beijing Institute of Technology, Beijing 100081, China

Received date: 2020-04-29

  Revised date: 2020-05-22

  Online published: 2020-06-24

Supported by

National Natural Science Foundation of China (61903350); Key Program of National Natural Science Foundation of China (U1613225)

摘要

协同攻防对抗是未来军用无人机的重要作战场景。针对不同机动能力无人机群体间的攻防对抗问题,建立了多无人机协同攻防演化模型,基于多智能体强化学习理论,研究了多无人机协同攻防的自主决策方法,提出了基于执行-评判(Actor-Critic)算法的集中式评判和分布式执行的算法结构,保证算法稳定收敛的同时,提升执行效率。无人机的评判模块使用全局信息评价决策优劣引导策略学习,而执行时只需要依赖局部感知信息进行自主决策,提高了多机攻防对抗的效能。仿真结果表明,所提的多无人机强化学习方法具备较强的自进化属性,赋予了无人机一定智能,即稳定的自主学习能力,通过不断演化,能自主学习提升协同对抗的决策效能。

本文引用格式

陈灿 , 莫雳 , 郑多 , 程子恒 , 林德福 . 非对称机动能力多无人机智能协同攻防对抗[J]. 航空学报, 2020 , 41(12) : 324152 -324152 . DOI: 10.7527/S1000-6893.2020.24152

Abstract

The attack-defense game is an important combat scenario of future military Unmanned Aerial Vehicles (UAVs). This paper studies an attack-defense game between groups of UAVs with different maneuverability, establishing a multi-UAV cooperative attack and defense evolution model. Based on the multi-agent reinforcement learning theory, the autonomous decision-making method of multi-UAV cooperative attack-defense game is studied, and a centralized critic and distributed actor algorithm structure is proposed based on the actor-critic algorithm, guaranteeing the convergence of the algorithm and improving the efficiency of decision-making. The critic module of UAVs uses the global information to evaluate the decision-making quality during training, while the actor module only needs to rely on the local perception information to make autonomous decisions during execution, hence improving the effectiveness of the multi-UAV attack-defense game. The simulation results show that the proposed multi-UAV reinforcement learning method has a strong self-evolution property, endowing the UAV certain intelligence, that is, the stable autonomous learning ability. Through continuous training, the UAVs can autonomously learn cooperative attack or defense policies to improve the effectiveness of decision-making.

参考文献

[1] ISAACS R. Differential games a mathematical theory with applications to warfare and pursuit, control and optimization[J]. Physics Bulletin, 1966, 17(2):1-2.
[2] BERKOVITZ L D. A variational approach to differential games[J]. Annals of Math Study, 1964, 127(52):127-174.
[3] HO Y, BRYSON A, BARON S. Differential games and optimal pursuit-evasion strategies[J]. IEEE Transactions on Automatic Control, 1965, 10(4):37-40.
[4] KRASOVSKⅡ N N, KOTEL'NIKOVA A N. Unification of differential games, generalized solutions of the Hamilton-Jacobi equations, and a stochastic guide[J]. Differential Equations, 2009, 45(11):1653-1668.
[5] KRIKELIS N, REKASIUS Z. On the solution of the optimal linear control problems under conflict of interest[J]. IEEE Transactions on Automatic Control, 1971, 16(2):140-147.
[6] INNOCENTI M, SCHMIDT D K. Quadratic optimal cooperative control synthesis with flight control application[J]. Journal of Guidance, Control, and Dynamics, 1984, 7(2):206-214.
[7] QIN C B, ZhANG H G, LUO Y H. Model-free adaptive dynamic programming for online optimal solution of the unknown nonlinear zero-sum differential game[C]//Proceedings of the 2014 International Joint Conference on Neural Networks. Piscataway:IEEE Press, 2014:3815-3820.
[8] LIU Y F, QI N, TANG Z W. Linear quadratic differential game strategies with two-pursuit versus single-evader[J]. Chinese Journal of Aeronautics, 2012, 25(6):896-905.
[9] 谢剑. 基于微分博弈论的多无人机追逃协同机动技术研究[D]. 哈尔滨:哈尔滨工业大学, 2015:32-45. XIE J. Differential game theory for multi UAV pursuit maneuver technology based on collaborative research[D]. Harbin:Harbin Institute of Technology, 2005:32-45(in Chinese)
[10] ISLER V, KANNAN S, KHANNA S, et al. Randomized pursuit-evasion in a polygonal environment[J]. IEEE Transactions on Robotics, 2005, 21(5):875-884.
[11] YAMAGUCHI H. A cooperative hunting behavior by mobile-robot troops[J]. The International Journal of Robotics Research, 2016, 18(9):931-940.
[12] CHEN J, ZHA W Z, PENG Z H, et al. Multi-player pursuit-evasion games with one superior evader[J]. Automatica, 2016, 71(71):24-32.
[13] PAN S, HUANG H, DING J, et al. Pursuit, evasion and defense in the plane[C]//Advances in Computing and Communications, 2012:4167-4173.
[14] FANG B F, PAN Q S, HONG B R, et al. Research on high speed evader vs multi lower speed pursuers in multi pursuit-evasion games[J]. Information Technology Journal, 2012, 11(8):989-997.
[15] CHEN X, WANG Y F. Study on multi-UAV air combat game based on fuzzy strategy[J]. Applied Mechanics and Materials, 2014, 494-495:1102-1105.
[16] WANG H P, YUE Q, LIU J T. Research on pursuit-evasion games with multiple heterogeneous pursuers and a high speed evader[C]//Chinese Control and Decision Conference. Piscataway:IEEE Press, 2015:4366-4370.
[17] AWHEDA M D, SCHWARTZ H M. A decentralized fuzzy learning algorithm for pursuit-evasion differential games with superior evaders[J]. Journal of Intelligent & Robotic Systems, 2016, 83(1):35-53.
[18] LU X S. Multi-agent reinforcement learning in games[D]. Ottawa:Carleton University, 2012:38-159.
[19] CHIDOZIE V A. Multi-robot learning in the guarding a territory game[D]. Ottawa:Carleton University, 2016:36-77.
[20] DUMAN E, KAYA M, AKIN E, et al. A multi-agent fuzzy-reinforcement learning method for continuous domains[C]//Multi Agent Systems and Applications IV, Lecture Notes in Computer Science. Berlin:Springer-Verlag, 2005:306-315.
[21] DAHL F A, HALCK O M. Minimax td-learning with neural nets in a Markov game[C]//European Conference on Machine Learning, 2000:117-128.
[22] BOWLING M, VELOSO M. Rational and convergent learning in stochastic games[C]//International Joint Conference on Artificial, 2001:1021-1026.
[23] BUSONIU L, BABUSKA R, SCHUTTER B D. Multi-agent reinforcement learning:An overview[M]. Berlin:Springer, 2010:1-3.
[24] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[EB/OL].[2020-04-20]. http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC2523/jonathan_rl.pdf.
[25] 左家亮, 杨任农, 张滢, 等. 基于启发式强化学习的空战机动智能决策[J]. 航空学报, 2017, 38(10):321168. ZUO J L, YANG R N, ZHANG Y, et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(10):321168(in Chinese)
[26] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning, 2014:387-395.
文章导航

/