导航

Acta Aeronautica et Astronautica Sinica

Previous Articles     Next Articles

基于多目标强化学习的太阳能无人机航迹规划

  

  • Received:2025-09-23 Revised:2025-12-21 Online:2025-12-23 Published:2025-12-23
  • Contact: jian ZHANG

Abstract: There is a significant coupling between the influencing factors of high-altitude long-endurance solar-powered UAVs in harvesting solar energy and gradient wind energy, and optimizing the harvesting efficiency of these two types of energy simultaneously often leads to conflicts. Therefore, the energy-optimal trajectory planning problem that comprehensively considers solar energy and gradient wind energy is a complex decision-making process that re-quires balancing multiple energy objectives. To address this issue, this study proposes a trajectory planning method based on multi-objective reinforcement learning. This method adopts the multi-objective Soft Actor-Critic (SAC) algorithm based on the multi-objective Markov decision process, combines the UAV's energy harvesting power and energy consumption power into a reward vector, and adds randomly generated weights in each update step. The converged trained policy network can output thrust, attack angle, and bank angle commands based on flight infor-mation and a given weight vector, enabling the generation of a set of energy-optimal trajectory solutions within the weight space. Simulation results show that compared with the minimum energy consumption strategy and the strategy based on the conventional single-objective SAC algorithm, this method consistently achieves better energy optimization efficiency and can adaptively respond to weight changes of energy objectives. Compared with the offline optimized trajectory solution set based on the Non-dominated Sorting Genetic Algorithm Ⅱ, the hypervolume of the trajectory solution set generated by this method can reach 90.07% of the former, while the single decision-making time is maintained at the millisecond level. In addition, this method also demonstrates a certain degree of generalization ability and can adapt to new untrained wind fields.

Key words: Solar-powered Unmanned Aerial Vehicle, Trajectory planning, Dynamic soaring, Energy optimization, Reinforcement learning, Multi-objective reinforcement learning

CLC Number: