导航

Acta Aeronautica et Astronautica Sinica ›› 2025, Vol. 46 ›› Issue (8): 331035.doi: 10.7527/S1000-6893.2024.31035

• Electronics and Electrical Engineering and Control • Previous Articles    

Mapless navigation of UAVs in dynamic environments based on an improved TD3 algorithm

Lingfeng JIANG1, Xinkai LI1(), Hai ZHANG2, Hanwei LI1, Hongli ZHANG3   

  1. 1.School of Electrical Engineering,Xinjiang University,Urumqi 830017,China
    2.Engineering Training Center,Xinjiang University,Urumqi 830017,China
    3.School of Intelligent Science and Technology (School of Future Technology),Xinjiang University,Urumqi 830017,China
  • Received:2024-08-02 Revised:2024-11-04 Accepted:2024-12-06 Online:2024-12-13 Published:2024-12-12
  • Contact: Xinkai LI E-mail:lxk@xju.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62263030);Natural Science Foundation of Xinjiang Uygur Autonomous Region(2022D01C86)

Abstract:

To address the challenges of mapping and navigation in unknown dynamic environments for drone navigation systems, a mapless navigation method based on an improved Twin Delayed Deep Deterministic policy gradient (TD3) is proposed. To solve the perception limitations in a mapless environment, the navigation model is defined as a Partially Observable Markov Decision Process (POMDP). A Gated Recurrent Units (GRU) is introduced to enable the policy network to utilize the temporal information from historical states, allowing it to obtain an optimal policy and avoid falling into local optima. Based on the TD3 algorithm, a softmax operator is employed to the value function, and a dual policy networks is adopted to address issues of policy function instability and value function underestimation in the TD3 algorithm. A non-sparse reward function is designed to resolve the challenge of policy convergence in reinforcement learning under sparse reward conditions. Finally, simulation experiments conducted on the AirSim platform demonstrate that the improved algorithm achieves faster convergence and higher task success rates in drone mapless obstacle avoidance navigation compared to traditional deep reinforcement learning algorithms.

Key words: mapless navigation, deep reinforcement learning, deterministic policy gradient, UAVs, dynamic environment

CLC Number: