[1] 宗群, 王丹丹, 邵士凯, 等. 多无人机协同编队飞行控制研究现状及发展[J]. 哈尔滨工业大学学报, 2017, 49(3):1-14. ZONG Q, WANG D D, SHAO S K, et al. Research status and development of multi UAV coordinated formation flight control[J]. Journal of Harbin Institute of Technology, 2017, 49(3):1-14(in Chinese). [2] 樊琼剑, 杨忠, 方挺, 等. 多无人机协同编队飞行控制的研究现状[J]. 航空学报, 2009, 30(4):683-691. FAN Q J, YANG Z, FANG T, et al. Research status of coordinated formation flight control for multi-UAVs[J]. Acta Aeronautica et Astronautica Sinica, 2009, 30(4):683-691(in Chinese). [3] 王祥科, 刘志宏, 丛一睿, 等. 小型固定翼无人机集群综述和未来发展[J]. 航空学报, 2020, 41(4):323732. WANG X K, LIU Z H, CONG Y R, et al. Miniature fixed-wing UAV swarms:Survey and directions[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(4):323732(in Chinese). [4] 贾永楠, 田似营, 李擎. 无人机集群研究进展综述[J]. 航空学报, 2020, 41(S1):723738. JIA Y N, TIAN S Y, LI Q. The development of unmanned aerial vehicle swarms[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(S1):723738(in Chinese). [5] KURIKI Y, NAMERIKAWA T. Formation control with collision avoidance for a multi-UAV system using decentralized MPC and consensus-based control[J]. Journal of Control, Measurement, and System Integration, 2015, 8(4):285-294. [6] SAIF O, FANTONI I, ZAVALA-RIO A. Distributed integral control of multiple UAVs:Precise flocking and navigation[J]. IET Control Theory & Applications, 2019, 13(13):2008-2017. [7] PHAM H X, LA H M, FEIL-SEIFER D, et al. Cooperative and distributed reinforcement learning of drones for field coverage[JDB/OL]. arXiv preprint:1803.07250,2018. [8] HUNG S M, GIVIGI S N. A Q-learning approach to flocking with UAVs in a stochastic environment[J]. IEEE Transactions on Cybernetics, 2017, 47(1):186-197. [9] SUTTON R S, BARTO A G. Reinforcement learning:An introduction[M]. Cambridge:MIT Press, 1998. [10] 高阳, 陈世福, 陆鑫. 强化学习研究综述[J]. 自动化学报, 2004, 30(1):86-100. GAO Y, CHEN S F, LU X. Research on reinforcement learning:A review[J]. Acta Automatic Sinica, 2004, 30(1):86-100(in Chinese). [11] YAN C, XIANG X. A path planning algorithm for UAV based on improved Q-learning[C]//International Conference on Robotics and Automation Sciences (ICRAS). Piscataway:IEEE Press, 2018:1-5. [12] EVERETT M, CHEN Y F, HOW J P. Motion planning among dynamic, decision-making agents with deep reinforcement learning[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway:IEEE Press, 2018:3052-3059. [13] TAI L, PAOLO G, LIU M. Virtual-to-real deep reinforcement learning:Continuous control of mobile robots for mapless navigation[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway:IEEE Press, 2017:31-36. [14] LONG P, LIU W, PAN J. Deep-learned collision avoidance policy for distributed multiagent navigation[J]. IEEE Robotics and Automation Letters, 2017, 2(2):656-663. [15] TOMIMASU M, MORIHIRO K, NISHIMURA H, et al. A reinforcement learning scheme of adaptive flocking behavior[C]//International Symposium on Artificial Life and Robotics (AROB). Oita:ISAROB, 2005:GS1-4. [16] MORIHIRO K, ISOKAWA T, NISHIMURA H, et al. Characteristics of flocking behavior model by reinforcement learning scheme[C]//SICE-ICASE International Joint Conference. Piscataway:IEEE Press, 2006:4551-4556. [17] LA H M, SHENG W. Distributed sensor fusion for scalar field mapping using mobile sensor networks[J]. IEEE Transactions on Cybernetics, 2013, 43(2):766-778. [18] LA H M, LIM R, SHENG W. Multirobot cooperative learning for predator avoidance[J]. IEEE Transactions on Control Systems Technology, 2015, 23(1):52-63. [19] WANG C, WANG J, ZHANG X, et al. A deep reinforcement learning approach to flocking and navigation of UAVs in large-scale complex environments[C]//IEEE Global Conference on Signal and Information Processing (GlobalSIP). Piscataway:IEEE Press, 2018:1228-1232. [20] HUNG S M, GIVIGI S N, NOURELDIN A. A dyna-Q(λ) approach to flocking with fixed-wing UAVs in a stochastic environment[C]//IEEE International Conference on Systems, Man, and Cybernetics. Piscataway:IEEE Press, 2015:1918-1923. [21] 左家亮, 杨任农, 张滢, 等. 基于启发式强化学习的空战机动智能决策[J]. 航空学报, 2017, 38(10):321168. ZUO J L, YANG R N, ZHANG Y, et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2017, 38(10):321168(in Chinese). [22] WATKINS C, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3):279-292. [23] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533. [24] VAN HASSELT H. Double Q-learning[C]//Advances in Neural Information Processing Systems. Vancouver:MIT Press, 2010:2613-2621. [25] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]//AAAI Conference on Artificial Intelligence. Menlo Park:AAAI, 2016:2094-2100. [26] WANG Z, SCHAUL T, HESSEL M, et al. Dueling network architectures for deep reinforcement learning[C]//International Conference on Machine Learning (ICML). Brookline:JMLR, 2016:1995-2003. [27] WANG C, YAN C, XIANG X, et al. A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs[C]//Asian Conference on Machine Learning (ACML). Brookline:JMLR, 2019:64-79. [28] QUINTERO S A P, COLLINS G E, HESPANHA J P. Flocking with fixed-wing UAVs for distributed sensing:A stochastic optimal control approach[C]//American Control Conference. Piscataway:IEEE Press, 2013:2025-2031. [29] YAN C, XIANG X, WANG C. Towards real-time path planning through deep reinforcement learning for a UAV in dynamic environments[J]. Journal of Intelligent & Robotic Systems, 2020, 98:297-309. [30] LV L, ZHANG S, DING D, et al. Path planning via an improved DQN-based learning policy[J]. IEEE Access, 2019, 7:67319-67330. [31] NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines[C]//International Conference on Machine Learning (ICML). Brookline:JMLR, 2010:807-814. |