导航

Acta Aeronautica et Astronautica Sinica

Previous Articles     Next Articles

Mapless Navigation of UAV in Dynamic Environments Based on an Improved TD3 Algorithm

  

  • Received:2024-08-02 Revised:2024-12-11 Online:2024-12-12 Published:2024-12-12
  • Contact: Xin-Kai LI

Abstract: To address the challenges of mapping and navigation in unknown dynamic environments for drone navigation sys-tems, a mapless navigation method based on an improved Twin Delayed Deep Deterministic Policy Gradient (TD3) is proposed. To solve the perception limitations in a mapless environment, the navigation model is defined as a Partially Observable Markov Decision Process (POMDP), and Gated Recurrent Units (GRU) are introduced. This allows the policy network to utilize temporal information from historical states to obtain the optimal policy, avoiding local optima. The method also introduces a softmax operator for the value function based on the TD3 algorithm, and employs dual policy networks to address issues of policy function instability and value function underestimation in the TD3 algorithm. A non-sparse reward function is designed to resolve the challenge of policy convergence in reinforcement learning under sparse reward conditions. Finally, simulation experiments conducted on the AirSim platform demonstrate that the improved algorithm achieves faster convergence and higher task success rates in drone mapless obstacle avoid-ance navigation compared to traditional deep reinforcement learning algorithms.

Key words: Mapless navigation, deep reinforcement learning, deterministic policy gradient, drones, dynamic environment

CLC Number: