随着无人机的广泛应用,其飞行能耗和计算能力面临着瓶颈问题,因此无人机路径规划研究越来越重要。很多情况下,无人机并不能提前获得目标点的确切位置和环境信息,往往无法规划出一条有效的飞行路径。针对这一问题,提出了基于导向强化Q学习的无人机路径规划方法,该方法利用接收信号强度定义回报值,并通过Q学习算法不断优化路径;提出"导向强化"的原则,加快了学习算法的收敛速度。仿真结果表明,该方法能够实现无人机的自主导航和快速路径规划,与传统算法相比,大大减少了迭代次数,能够获得更短的规划路径。
With the increasing application of the Unmanned Aerial Vehicle (UAV) technology, the energy consumption and computing capacity of UAV are faced with bottleneck problems, so path planning of UAV is becoming increasingly important. In many cases, the UAV cannot obtain the exact location of the target point and environmental information in advance, and thus is difficult to plan an effective flight path. To solve this problem, this paper proposes a path planning method for UAV using the guided enhancement Q-learning algorithm. This method uses Receiving Signal Strength (RSS) to define the reward value, and continuously optimizes the path by using the Q-learning algorithm. The principle of "guided reinforcement" is proposed to accelerate the convergence speed of the Q learning algorithm. The simulation results show that the method proposed can realize autonomous navigation and fast path planning for UAV. Compared with the traditional algorithm, it can greatly reduce the number of iterations and obtain a shorter planned path.
[1] BECERRA V M. Autonomous control of unmanned aerial vehicles[J]. Electronics, 2019, 8(4):452.
[2] COUTINHO W P, BATTARRA M, FLIEGE J. The unmanned aerial vehicle routing and trajectory optimisation problem, a taxonomic review[J]. Computers & Industrial Engineering, 2018, 120:116-128.
[3] XU Y P, CHE C. A brief review of the intelligent algorithm for traveling salesman problem in UAV route planning[C]//2019 IEEE 9th International Conference on Electronics Information and Emergency Communication (ICEIEC). Piscataway:IEEE Press, 2019:1-7.
[4] ZHAO Y J, ZHENG Z, ZHANG X Y, et al. Q learning algorithm based UAV path learning and obstacle avoidance approach[C]//201736th Chinese Control Conference (CCC). Piscataway:IEEE Press, 2017:3397-3402.
[5] CAI Y Z, XI Q B, XING X J, et al. Path planning for UAV tracking target based on improved A-star algorithm[C]//20191st International Conference on Industrial Artificial Intelligence (IAI). Piscataway:IEEE Press, 2019:1-6.
[6] CHEN X, CHEN X M. The UAV dynamic path planning algorithm research based on Voronoi diagram[C]//The 26th Chinese Control and Decision Conference (2014 CCDC). Piscataway:IEEE Press, 2014:1069-1071.
[7] LI W H. An improved artificial potential field method based on chaos theory for UAV route planning[C]//201934rd Youth Academic Annual Conference of Chinese Association of Automation (YAC). Piscataway:IEEE Press, 2019:47-51.
[8] AGGARWAL S, KUMAR N. Path planning techniques for unmanned aerial vehicles:A review, solutions, and challenges[J]. Computer Communications, 2020, 149:270-299.
[9] WAI R J, PRASETIA A S. Adaptive neural network control and optimal path planning of UAV surveillance system with energy consumption prediction[J]. IEEE Access, 2019, 7:126137-126153.
[10] SALAMAT B, TONELLO A M. A modelling approach to generate representative UAV trajectories using PSO[C]//201927th European Signal Processing Conference (EUSIPCO). Piscataway:IEEE Press, 2019:1-5.
[11] VILLANUEVA A, FAJARDO A. Deep reinforcement learning with noise injection for UAV path planning[C]//2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS). Piscataway:IEEE Press, 2019:1-6.
[12] PENG Z H, LI B, CHEN X T, et al. Online route planning for UAV based on model predictive control and particle swarm optimization algorithm[C]//Proceedings of the 10th World Congress on Intelligent Control and Automation. Piscataway:IEEE Press, 2012:397-401.
[13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-533.
[14] LI R X, FU L, WANG L L, et al. Improved Q-learning based route planning method for UAVs in unknown environment[C]//2019 IEEE 15th International Conference on Control and Automation (ICCA). Piscataway:IEEE Press, 2019:118-123.
[15] LI S D, XU X, ZUO L. Dynamic path planning of a mobile robot with improved Q-learning algorithm[C]//2015 IEEE International Conference on Information and Automation. Piscataway:IEEE Press, 2015:409-414.
[16] PHAM H X, LA H M, FEIL-SEIFER D, et al. Reinforcement learning for autonomous UAV navigation using function approximation[C]//2018 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR). Piscataway:IEEE Press, 2018:1-6.
[17] PENG J S. Mobile robot path planning based on improved Q-learning algorithm[J]. International Journal of Multimedia and Ubiquitous Engineering, 2015, 10(7):285-294.
[18] AL-HOURANI A, KANDEEPAN S, JAMALIPOUR A. Modeling air-to-ground path loss for low altitude platforms in urban environments[C]//2014 IEEE Global Communications Conference. Piscataway:IEEE Press, 2014:2898-2904.
[19] WATKINS C J C H, DAYAN P. Q-learning[J]. Machine Learning, 1992, 8(3-4):279-292.
[20] WANG Z Y, SHI Z G, LI Y K, et al. The optimization of path planning for multi-robot system using Boltzmann Policy based Q-learning algorithm[C]//2013 IEEE International Conference on Robotics and Biomimetics (ROBIO). Piscataway:IEEE Press, 2013:1199-1204.
[21] RASTOGI K, LEE J, HAREL-CANADA F, et al. Is Q-learning provably efficient? An extended analysis[DB/OL]. arXiv:2009.10396,2020.
[22] 陈崚, 孙海鹰. 蚁群算法一阶欺骗性问题的时间复杂度分析[J]. 模式识别与人工智能, 2010, 23(1):1-6. CHEN L, SUN H Y. Time complexity analysis of ant colony algorithm on first order deceptive problem[J]. Pattern Recognition and Artificial Intelligence, 2010, 23(1):1-6(in Chinese).
[23] HE X F, YU W, XU H S, et al. Towards 3D deployment of UAV base stations in uneven terrain[C]//201827th International Conference on Computer Communication and Networks (ICCCN).Piscataway:IEEE Press, 2018:1-9.