无人机的集群化应用技术是近年来的研究热点,随着无人机自主智能的不断提高,无人机集群技术必将成为未来无人机发展的主要趋势之一。针对无人机集群协同执行对敌方来袭目标的追击任务,构建了典型的任务场景,基于深度确定性策略梯度网络(DDPG)算法,设计了一种引导型回报函数有效解决了深度强化学习在长周期任务下的稀疏回报问题,通过引入基于滑动平均值的软更新策略减少了DDPG算法中Eval网络和Target网络在训练过程中的参数震荡,提高了算法的训练效率。仿真结果表明,训练完成后的无人机集群能够较好地执行对敌方来袭目标的追击任务,任务成功率达到95%。可以说无人机集群技术作为一种全新概念的作战模式在军事领域具有潜在的应用价值,人工智能算法在无人机集群的自主决策智能化发展方向上具有一定的应用前景。
The Unmanned Aerial Vehicle (UAV) swarm technology is one of the research hotspots in recent years. With continuous advancement in autonomous intelligence of UAVs, the UAV swarm technology is bound to become one of the main trends of UAV development in the future. In view of the collaborative pursuit missions of UAV swarms against the enemy, we establish a typical task scenario, and, based on the Deep Deterministic Policy Gradient (DDPG) algorithm, design a guided reward function which effectively solves the sparse rewards problem of deep intensive learning during long-period missions. We introduce a sliding average based soft updating strategy to reduce parameter oscillations in the Eval network and the target network during the training process, thereby improving the training efficiency. The simulation results show that after training, the UAV swarm can successfully carry out the pursuit missions with a success rate of 95%. The UAV swarm technology as a brand new combat mode has a potential application value for application in the military field, and this artificial intelligence algorithm has a certain application prospect in the development of autonomous decision-making by UAV swarms.
[1] PHAM H X, LA H M, FEILSEIFER D, et al. Autonomous UAV navigation using reinforcement learning[EB/OL]. (2018-01-16)[2020-03-10]. https://arxiv.org/abs/1801.05086.
[2] PHAM H X, LA H M, FEILSEIFER D, et al. Cooperative and distributed reinforcement learning of drones for field coverage[J].(2018-09-16)[2020-03-10]. https://arxiv.org/abs/1803.07250.
[3] QI S, ZHU S. Intent-aware multi-agent reinforcement learning[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). Piscataway:IEEE Press, 2018:7533-7540.
[4] 李高垒, 马耀飞. 基于深度网络的空战态势特征提取[J].系统仿真学报, 2017, 29(S1):98-105, 112. LI G L, MA Y F. Feature extraction algorithm of air combat situation based on deep neural networks[J].Journal of System Simulation, 2017, 29(S1):98-105, 112(in Chinese).
[5] 魏航. 基于强化学习的无人机空中格斗算法研究[D]. 哈尔滨:哈尔滨工业大学, 2015. WEI H. Resarch of UCAV air combat based on reinforcement learning[D]. Harbin:Harbin Institute of Technology,2015(in Chinese).
[6] YAMAGUCHI H. A cooperative hunting behavior by mobile robot troops[C]//Proceedings 1998 IEEE International Conference on Robotics and Automation. Piscataway:IEEE Press, 1998:931-940.
[7] GADRE A. Learning strategies in multi-agent systems applications to the herding problem[D]. Blacksburg:Virginia Polytechnic Institute and State University, 2001.
[8] 苏治宝, 陆际联, 童亮. 一种多移动机器人协作围捕策略[J].北京理工大学学报, 2004(5):32-35, 44. SU Z B, LU J L, TONG L. Strategy of cooperative hunting by multiple mobile robots[J].Beijing Institute of Technology, 2004(5):32-35, 44(in Chinese).
[9] 罗德林, 徐扬, 张金鹏. 无人机集群对抗技术新进展[J].科技导报, 2017,35(7):26-31. LUO D L, XU Y, ZHANG J P. New progresses on UAV swarm confrontation[J].Science & Technology Review, 2017,35(7):26-31(in Chinese).
[10] CARL E J. Analysis of fatigue, fatigue-crack propagation and fracture data:AIAA-2009-1363[R]. Reston:AIAA, 2009.
[11] ZUHAIR Q M, SONGHAO P, HAIYANG J, et al. A novel approach for multi-agent cooperative pursuit to capture grouped evaders[J].The Journal of Supercomputing, 2018, 76:3416-3426.
[12] ZHAOYI P, SONGHAO P, MOHAMMED E H S, et al. Coalition formation for multi-agent pursuit based on neural network[J].Journal of Intelligent & Robotic Systems, 2019, 95(1):887-899.
[13] HUMAYOO M, CHENG X. Relative importance sampling for off-policy actor-critic in deep reinforcement learning[EB/OL]. (2019-07-19)[2020-03-10]. https://arxiv.org/abs/1810.12558?context=cs.
[14] 刘建伟, 高峰, 罗雄麟. 基于值函数和策略梯度的深度强化学习综述[J].计算机学报, 2019, 42(6):1406-1438. LIU J W, GAO F, LUO X L. A survey of deep reinforcement learning based on value function and strategy gradient[J].Chinese Journal of Computers, 2019, 42(6):1406-1438(in Chinese).
[15] WANG G, SHI J. Actor-critic for multi-agent system with variable quantity of agents[C]//International Conference on Internet of Things as a Service, 2017:48-56.
[16] HUANG W, WANG Y, YI X. A deep reinforcement learning approach to preserve connectivity for multi-robot systems[C]//2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). Piscataway:IEEE Press, 2017:1-7.
[17] YI H. Deep deterministic policy gradient for autonomous vehicle driving[C]//Proceedings on the International Conference on Artificial Intelligence (ICAI), 2018:191-194.
[18] ANDERSEN P, GOODWIN M, GRANMO O. Deep RTS:A game environment for deep reinforcement learning in real-time strategy games[C]//2018 IEEE Conference on Computational Intelligence and Games (CIG). Piscataway:IEEE Press, 2018:1-8.
[19] DILOKTHANAKUL N, KAPLANIS C, PAWLOWSKI N, et al. Feature control as intrinsic motivation for hierarchical reinforcement learning[J].IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11):3409-3418.
[20] NIE H, CHEN Y, SONG Y, et al. A general real-time OPF algorithm using DDPG with multiple simulation platforms[C]//2019 IEEE Innovative Smart Grid Technologies-Asia (ISGT Asia). Piscataway:IEEE Press, 2019:3713-3718.
[21] YANG Q, ZHU Y, ZHANG J, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm[C]//2019 IEEE 15th International Conference on Control and Automation (ICCA). Piscataway:IEEE Press, 2019:37-42.
[22] BANERJEE A, GHOSH D, DAAS S. Evolving network topology in policy gradient reinforcement learning algorithms[C]//2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), 2019:1-5.
[23] SHI H, SUN Y, LI G. Model-based DDPG for motor control[C]//2017 International Conference on Progress in Informatics and Computing (PIC), 2017:284-288.