[1] PHAM H X, LA H M, FEILSEIFER D, et al. Autonomous UAV navigation using reinforcement learning[EB/OL]. (2018-01-16)[2020-03-10]. https://arxiv.org/abs/1801.05086. [2] PHAM H X, LA H M, FEILSEIFER D, et al. Cooperative and distributed reinforcement learning of drones for field coverage[J].(2018-09-16)[2020-03-10]. https://arxiv.org/abs/1803.07250. [3] QI S, ZHU S. Intent-aware multi-agent reinforcement learning[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). Piscataway:IEEE Press, 2018:7533-7540. [4] 李高垒, 马耀飞. 基于深度网络的空战态势特征提取[J].系统仿真学报, 2017, 29(S1):98-105, 112. LI G L, MA Y F. Feature extraction algorithm of air combat situation based on deep neural networks[J].Journal of System Simulation, 2017, 29(S1):98-105, 112(in Chinese). [5] 魏航. 基于强化学习的无人机空中格斗算法研究[D]. 哈尔滨:哈尔滨工业大学, 2015. WEI H. Resarch of UCAV air combat based on reinforcement learning[D]. Harbin:Harbin Institute of Technology,2015(in Chinese). [6] YAMAGUCHI H. A cooperative hunting behavior by mobile robot troops[C]//Proceedings 1998 IEEE International Conference on Robotics and Automation. Piscataway:IEEE Press, 1998:931-940. [7] GADRE A. Learning strategies in multi-agent systems applications to the herding problem[D]. Blacksburg:Virginia Polytechnic Institute and State University, 2001. [8] 苏治宝, 陆际联, 童亮. 一种多移动机器人协作围捕策略[J].北京理工大学学报, 2004(5):32-35, 44. SU Z B, LU J L, TONG L. Strategy of cooperative hunting by multiple mobile robots[J].Beijing Institute of Technology, 2004(5):32-35, 44(in Chinese). [9] 罗德林, 徐扬, 张金鹏. 无人机集群对抗技术新进展[J].科技导报, 2017,35(7):26-31. LUO D L, XU Y, ZHANG J P. New progresses on UAV swarm confrontation[J].Science & Technology Review, 2017,35(7):26-31(in Chinese). [10] CARL E J. Analysis of fatigue, fatigue-crack propagation and fracture data:AIAA-2009-1363[R]. Reston:AIAA, 2009. [11] ZUHAIR Q M, SONGHAO P, HAIYANG J, et al. A novel approach for multi-agent cooperative pursuit to capture grouped evaders[J].The Journal of Supercomputing, 2018, 76:3416-3426. [12] ZHAOYI P, SONGHAO P, MOHAMMED E H S, et al. Coalition formation for multi-agent pursuit based on neural network[J].Journal of Intelligent & Robotic Systems, 2019, 95(1):887-899. [13] HUMAYOO M, CHENG X. Relative importance sampling for off-policy actor-critic in deep reinforcement learning[EB/OL]. (2019-07-19)[2020-03-10]. https://arxiv.org/abs/1810.12558?context=cs. [14] 刘建伟, 高峰, 罗雄麟. 基于值函数和策略梯度的深度强化学习综述[J].计算机学报, 2019, 42(6):1406-1438. LIU J W, GAO F, LUO X L. A survey of deep reinforcement learning based on value function and strategy gradient[J].Chinese Journal of Computers, 2019, 42(6):1406-1438(in Chinese). [15] WANG G, SHI J. Actor-critic for multi-agent system with variable quantity of agents[C]//International Conference on Internet of Things as a Service, 2017:48-56. [16] HUANG W, WANG Y, YI X. A deep reinforcement learning approach to preserve connectivity for multi-robot systems[C]//2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI). Piscataway:IEEE Press, 2017:1-7. [17] YI H. Deep deterministic policy gradient for autonomous vehicle driving[C]//Proceedings on the International Conference on Artificial Intelligence (ICAI), 2018:191-194. [18] ANDERSEN P, GOODWIN M, GRANMO O. Deep RTS:A game environment for deep reinforcement learning in real-time strategy games[C]//2018 IEEE Conference on Computational Intelligence and Games (CIG). Piscataway:IEEE Press, 2018:1-8. [19] DILOKTHANAKUL N, KAPLANIS C, PAWLOWSKI N, et al. Feature control as intrinsic motivation for hierarchical reinforcement learning[J].IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(11):3409-3418. [20] NIE H, CHEN Y, SONG Y, et al. A general real-time OPF algorithm using DDPG with multiple simulation platforms[C]//2019 IEEE Innovative Smart Grid Technologies-Asia (ISGT Asia). Piscataway:IEEE Press, 2019:3713-3718. [21] YANG Q, ZHU Y, ZHANG J, et al. UAV air combat autonomous maneuver decision based on DDPG algorithm[C]//2019 IEEE 15th International Conference on Control and Automation (ICCA). Piscataway:IEEE Press, 2019:37-42. [22] BANERJEE A, GHOSH D, DAAS S. Evolving network topology in policy gradient reinforcement learning algorithms[C]//2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), 2019:1-5. [23] SHI H, SUN Y, LI G. Model-based DDPG for motor control[C]//2017 International Conference on Progress in Informatics and Computing (PIC), 2017:284-288. |