针对无人机导航系统在未知动态环境中难以进行建图、导航的问题,提出了基于改进双延迟深度确定性策略梯度(Twin Delayed Deep Deterministic Policy Gradient, TD3)端到端无地图导航方法。为解决无地图环境下无人机感知受限问题将导航模型定义为部分可观马尔科夫决策过程(Partially Observable Markov Decision Process, POMDP)引入门控循环单元(Gated Recurrent Unit, GRU),使得策略网络能够利用历史状态的时序信息获取最优策略,避免陷入局部最优;基于TD3算法引入softmax算子对值函数进行处理,同时采用双策略网络,以解决TD3算法中存在策略函数不稳定和值函数低估问题;设计非稀疏奖励函数,解决强化学习策略在稀疏奖励条件下难以收敛的问题。最后,在AirSim平台上进行仿真实验,结果表明,相比传统深度强化学习算法,改进算法在无人机无地图避障导航问题上,具有更快的收敛速度和更高的任务成功率。
To address the challenges of mapping and navigation in unknown dynamic environments for drone navigation sys-tems, a mapless navigation method based on an improved Twin Delayed Deep Deterministic Policy Gradient (TD3) is proposed. To solve the perception limitations in a mapless environment, the navigation model is defined as a Partially Observable Markov Decision Process (POMDP), and Gated Recurrent Units (GRU) are introduced. This allows the policy network to utilize temporal information from historical states to obtain the optimal policy, avoiding local optima. The method also introduces a softmax operator for the value function based on the TD3 algorithm, and employs dual policy networks to address issues of policy function instability and value function underestimation in the TD3 algorithm. A non-sparse reward function is designed to resolve the challenge of policy convergence in reinforcement learning under sparse reward conditions. Finally, simulation experiments conducted on the AirSim platform demonstrate that the improved algorithm achieves faster convergence and higher task success rates in drone mapless obstacle avoid-ance navigation compared to traditional deep reinforcement learning algorithms.
[1] HUANG Z, WU J, LV C. Efficient Deep Rein-forcement Learning With Imitative Expert Priors for Autonomous Driving[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(10): 7391-7403.
[2] ADIL M, SONG H, JAN M A, et al. UAV-Assisted IoT Applications, QoS Requirements and Challenges with Future Research Directions[J]. ACM Comput. Surv., 2024, 56(10): 251:1-251:35.
[3] XUE Z, GONSALVES T. Vision Based Drone Obsta-cle Avoidance by Deep Reinforcement Learning[J]. AI, 2021, 2(3): 366-380.
[4] LI J, QIN H, WANG J, et al. OpenStreetMap-Based Autonomous Navigation for the Four Wheel-Legged Robot Via 3D-Lidar and CCD Camera[J]. IEEE Transactions on Industrial Electronics, 2022, 69(3): 2708-2717.
[5] CAI D, LI R, HU Z, et al. A comprehensive overview of core modules in visual SLAM framework[J]. Neu-rocomputing, 2024, 590: 127760.
[6] YANG C, CHEN C, HE W, et al. Robot Learning System Based on Adaptive Neural Control and Dy-namic Movement Primitives[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(3): 777-787.
[7] ALMAZROUEI K, KAMEL I, RABIE T. Dynamic Obstacle Avoidance and Path Planning through Rein-forcement Learning[J/OL]. Applied Sciences, 2023, 13(14): 8174.
[8] SATHYAMOORTHY A J, PATEL U, GUAN T, et al. Frozone: Freezing-Free, Pedestrian-Friendly Naviga-tion in Human Crowds[J]. IEEE Robotics and Auto-mation Letters,2020,5(3):4352-4359.
[9] 周彬, 郭艳, 李宁, 等. 基于导向强化Q学习的无人机路径规划[J]. 航空学报, 2021, 42(9): 506-513.
ZHOU B, GUO Y, LI N et al. Path planning of UAV using guided enhancement Q-learning algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(9): 506-513.
[10] CHAI R, NIU H, CARRASCO J, et al. Design and Experimental Validation of Deep Reinforcement Learning-Based Fast Trajectory Planning and Control for Mobile Robot in Unknown Environment[J]. IEEE Transactions on Neural Networks and Learning Sys-tems, 2022: 1-15.
[11] XIE Z, DAMES P. DRL-VO: Learning to Navigate Through Crowded Dynamic Scenes Using Velocity Obstacles[J]. IEEE Transactions on Robotics, 2023: 1-20.
[12] CAO X, REN L, SUN C. Research on Obstacle De-tection and Avoidance of Autonomous Underwater Vehicle Based on Forward-Looking Sonar[J/OL]. IEEE Transactions on Neural Networks and Learning Systems, 2022: 1-11.
[13] WANG W ye, MA F, LIU J. Course Tracking Control for Smart Ships Based on A Deep Deterministic Poli-cy Gradient-based Algorithm[C]//2019 5th Interna-tional Conference on Transportation Information and Safety (ICTIS). 2019: 1400-1404.
[14] LILLICRAP T P, HUNT J , PRITZEL A, et al. Con-tinuous control with deep reinforcement learn-ing[M/OL]. arXiv,2019.
[15] SILVER D, LEVER G, HEESS N, et al. Deterministic Policy Gradient Algorithms[C]//Proceedings of the 31st International Conference on Machine Learning. PMLR, 2014: 387-395.
[16] FUJIMOTO S, HOOF H, MEGER D. Addressing Function Approximation Error in Actor-Critic Meth-ods[C]//Proceedings of the 35th International Confer-ence on Machine Learning. PMLR, 2018: 1587-1596.
[17] 寇凯, 杨刚, 张文启, 等. 基于SAC的无人机自主导航方法研究[J]. 西北工业大学学报, 2024, 42(2): 310-318.
KOU K, YANG G, ZHANG W Q, et al. Exploring UAV autonomous navigation algorithm based on soft ac tor-critic[J]. Journal of Northwestern Polytechnical University,
[18] SINGLA A, PADAKANDLA S, BHATNAGAR S. Memory-Based Deep Reinforcement Learning for Ob-stacle Avoidance in UAV With Limited Environment Knowledge[J]. IEEE Transactions on Intelligent Transportation Systems, 2021, 22(1): 107-118.
[19] CUI Z, WANG Y. UAV Path Planning Based on Mul-ti-Layer Reinforcement Learning Technique[J]. IEEE Access, 2021, 9: 59486-59497.
[20] XUE Y, CHEN W. A UAV Navigation Approach Based on Deep Reinforcement Learning in Large Cluttered 3D Environments[J]. IEEE Transactions on Vehicular Technology, 2023, 72(3): 3001-3014.
[21] EVERETT M, CHEN Y F, HOW J P. Motion Plan-ning Among Dynamic, Decision-Making Agents with Deep Reinforcement Learning[C]//2018 IEEE/RSJ International Conference on Intelligent Ro-bots and Systems (IROS). 2018: 3052-3059.
[22] KAELBLING L P, LITTMAN M L, CASSANDRA A R. Planning and acting in partially observable stochas-tic domains[J]. Artificial Intelligence, 1998, 101(1): 99-134.
[23] XIAO C, LU P, HE Q. Flying Through a Narrow Gap Using End-to-End Deep Reinforcement Learning Augmented With Curriculum Learning and Sim2Real[J]. IEEE Transactions on Neural Networks and Learning Systems, 2023, 34(5): 2701-2708.
[24] JIA J, XING X, CHANG D E. GRU-Attention based TD3 Network for Mobile Robot Navigation[C]//2022 22nd International Conference on Control, Automa-tion and Systems (ICCAS). 2022: 1642-1647
[25] DEY R, SALEM F M. Gate-variants of Gated Recur-rent Unit (GRU) neural networks[C/OL]//2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS). 2017: 1597-1600.https://
ieeexplore.ieee.org/abstract/document/8053243.
DOI:10.1109/MWSCAS.2017.8053243.
[26] KALIDAS A P, JOSHUA C J, MD A Q, et al. Deep Reinforcement Learning for Vision-Based Navigation of UAVs in Avoiding Stationary and Mobile Obsta-cles[J]. Drones, 2023, 7(4): 245.
[27] 杨卫平. 新一代飞行器导航制导与控制技术发展趋势[J]. 航空学报, 2024, 45(5): 154-178.
YANG W P. Development trend of navigation guid-ance and control technology for new generation air-craft[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 154-178.
[28] PAN L, CAI Q, HUANG L. Softmax Deep Do-ble Deterministic Policy Gradients[J]. arXiv, 2020.