The attack-defense game is an important combat scenario of future military Unmanned Aerial Vehicles (UAVs). This paper studies an attack-defense game between groups of UAVs with different maneuverability, establishing a multi-UAV cooperative attack and defense evolution model. Based on the multi-agent reinforcement learning theory, the autonomous decision-making method of multi-UAV cooperative attack-defense game is studied, and a centralized critic and distributed actor algorithm structure is proposed based on the actor-critic algorithm, guaranteeing the convergence of the algorithm and improving the efficiency of decision-making. The critic module of UAVs uses the global information to evaluate the decision-making quality during training, while the actor module only needs to rely on the local perception information to make autonomous decisions during execution, hence improving the effectiveness of the multi-UAV attack-defense game. The simulation results show that the proposed multi-UAV reinforcement learning method has a strong self-evolution property, endowing the UAV certain intelligence, that is, the stable autonomous learning ability. Through continuous training, the UAVs can autonomously learn cooperative attack or defense policies to improve the effectiveness of decision-making.
CHEN Can
,
MO Li
,
ZHENG Duo
,
CHENG Ziheng
,
LIN Defu
. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2020
, 41(12)
: 324152
-324152
.
DOI: 10.7527/S1000-6893.2020.24152
[1] ISAACS R. Differential games a mathematical theory with applications to warfare and pursuit, control and optimization[J]. Physics Bulletin, 1966, 17(2):1-2.
[2] BERKOVITZ L D. A variational approach to differential games[J]. Annals of Math Study, 1964, 127(52):127-174.
[3] HO Y, BRYSON A, BARON S. Differential games and optimal pursuit-evasion strategies[J]. IEEE Transactions on Automatic Control, 1965, 10(4):37-40.
[4] KRASOVSKⅡ N N, KOTEL'NIKOVA A N. Unification of differential games, generalized solutions of the Hamilton-Jacobi equations, and a stochastic guide[J]. Differential Equations, 2009, 45(11):1653-1668.
[5] KRIKELIS N, REKASIUS Z. On the solution of the optimal linear control problems under conflict of interest[J]. IEEE Transactions on Automatic Control, 1971, 16(2):140-147.
[6] INNOCENTI M, SCHMIDT D K. Quadratic optimal cooperative control synthesis with flight control application[J]. Journal of Guidance, Control, and Dynamics, 1984, 7(2):206-214.
[7] QIN C B, ZhANG H G, LUO Y H. Model-free adaptive dynamic programming for online optimal solution of the unknown nonlinear zero-sum differential game[C]//Proceedings of the 2014 International Joint Conference on Neural Networks. Piscataway:IEEE Press, 2014:3815-3820.
[8] LIU Y F, QI N, TANG Z W. Linear quadratic differential game strategies with two-pursuit versus single-evader[J]. Chinese Journal of Aeronautics, 2012, 25(6):896-905.
[9] 谢剑. 基于微分博弈论的多无人机追逃协同机动技术研究[D]. 哈尔滨:哈尔滨工业大学, 2015:32-45. XIE J. Differential game theory for multi UAV pursuit maneuver technology based on collaborative research[D]. Harbin:Harbin Institute of Technology, 2005:32-45(in Chinese)
[10] ISLER V, KANNAN S, KHANNA S, et al. Randomized pursuit-evasion in a polygonal environment[J]. IEEE Transactions on Robotics, 2005, 21(5):875-884.
[11] YAMAGUCHI H. A cooperative hunting behavior by mobile-robot troops[J]. The International Journal of Robotics Research, 2016, 18(9):931-940.
[12] CHEN J, ZHA W Z, PENG Z H, et al. Multi-player pursuit-evasion games with one superior evader[J]. Automatica, 2016, 71(71):24-32.
[13] PAN S, HUANG H, DING J, et al. Pursuit, evasion and defense in the plane[C]//Advances in Computing and Communications, 2012:4167-4173.
[14] FANG B F, PAN Q S, HONG B R, et al. Research on high speed evader vs multi lower speed pursuers in multi pursuit-evasion games[J]. Information Technology Journal, 2012, 11(8):989-997.
[15] CHEN X, WANG Y F. Study on multi-UAV air combat game based on fuzzy strategy[J]. Applied Mechanics and Materials, 2014, 494-495:1102-1105.
[16] WANG H P, YUE Q, LIU J T. Research on pursuit-evasion games with multiple heterogeneous pursuers and a high speed evader[C]//Chinese Control and Decision Conference. Piscataway:IEEE Press, 2015:4366-4370.
[17] AWHEDA M D, SCHWARTZ H M. A decentralized fuzzy learning algorithm for pursuit-evasion differential games with superior evaders[J]. Journal of Intelligent & Robotic Systems, 2016, 83(1):35-53.
[18] LU X S. Multi-agent reinforcement learning in games[D]. Ottawa:Carleton University, 2012:38-159.
[19] CHIDOZIE V A. Multi-robot learning in the guarding a territory game[D]. Ottawa:Carleton University, 2016:36-77.
[20] DUMAN E, KAYA M, AKIN E, et al. A multi-agent fuzzy-reinforcement learning method for continuous domains[C]//Multi Agent Systems and Applications IV, Lecture Notes in Computer Science. Berlin:Springer-Verlag, 2005:306-315.
[21] DAHL F A, HALCK O M. Minimax td-learning with neural nets in a Markov game[C]//European Conference on Machine Learning, 2000:117-128.
[22] BOWLING M, VELOSO M. Rational and convergent learning in stochastic games[C]//International Joint Conference on Artificial, 2001:1021-1026.
[23] BUSONIU L, BABUSKA R, SCHUTTER B D. Multi-agent reinforcement learning:An overview[M]. Berlin:Springer, 2010:1-3.
[24] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[EB/OL].[2020-04-20]. http://www.cs.utoronto.ca/~fidler/teaching/2015/slides/CSC2523/jonathan_rl.pdf.
[25] 左家亮, 杨任农, 张滢, 等. 基于启发式强化学习的空战机动智能决策[J]. 航空学报, 2017, 38(10):321168. ZUO J L, YANG R N, ZHANG Y, et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(10):321168(in Chinese)
[26] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning, 2014:387-395.