基于多目标强化学习的太阳能无人机航迹规划

  • 徐体超 ,
  • 蒙文跃 ,
  • 张健
展开
  • 1. 中国科学院工程热物理研究所
    2. 中科院工程热物理研究所

收稿日期: 2025-09-23

  修回日期: 2025-12-21

  网络出版日期: 2025-12-23

基于多目标强化学习的太阳能无人机航迹规划

  • XU Ti-Chao ,
  • MENG Wen-Yue ,
  • ZHANG Jian
Expand

Received date: 2025-09-23

  Revised date: 2025-12-21

  Online published: 2025-12-23

摘要

高空长航时太阳能无人机获取太阳能与梯度风能的影响因素之间存在显著耦合,同时优化两种能源的获取效率往往会产生冲突。针对该问题,本研究提出一种基于多目标强化学习的航迹规划方法,该方法采取基于多目标马尔可夫决策过程的多目标软角色-批评家算法,将无人机的能量获取功率和能量消耗功率组合为奖励向量,并在每个更新步骤中添加随机生成的权重。训练收敛的策略神经网络可以根据飞行信息和给定的权重向量输出推力、攻角和滚转角指令,能够生成权重空间内的能量最优航迹解集。仿真结果表明,该方法与最小能耗策略和基于常规单目标软角色-批评家算法的策略相比始终具备更好的能量优化效率,并且能够自适应能量目标的权重变化。与基于非支配排序遗传算法的离线优化航迹解集相比,该方法在航迹解集的超体积达到前者的90.07%的同时还具备优异的实时性。此外,该方法还表现了一定的泛化能力,能够一定程度上适应风梯度变化后的新风场。

本文引用格式

徐体超 , 蒙文跃 , 张健 . 基于多目标强化学习的太阳能无人机航迹规划[J]. 航空学报, 0 : 1 -0 . DOI: 10.7527/S1000-6893.2025.32817

Abstract

There is a significant coupling between the influencing factors of high-altitude long-endurance solar-powered UAVs in harvesting solar energy and gradient wind energy, and optimizing the harvesting efficiency of these two types of energy simultaneously often leads to conflicts. Therefore, the energy-optimal trajectory planning problem that comprehensively considers solar energy and gradient wind energy is a complex decision-making process that re-quires balancing multiple energy objectives. To address this issue, this study proposes a trajectory planning method based on multi-objective reinforcement learning. This method adopts the multi-objective Soft Actor-Critic (SAC) algorithm based on the multi-objective Markov decision process, combines the UAV's energy harvesting power and energy consumption power into a reward vector, and adds randomly generated weights in each update step. The converged trained policy network can output thrust, attack angle, and bank angle commands based on flight infor-mation and a given weight vector, enabling the generation of a set of energy-optimal trajectory solutions within the weight space. Simulation results show that compared with the minimum energy consumption strategy and the strategy based on the conventional single-objective SAC algorithm, this method consistently achieves better energy optimization efficiency and can adaptively respond to weight changes of energy objectives. Compared with the offline optimized trajectory solution set based on the Non-dominated Sorting Genetic Algorithm Ⅱ, the hypervolume of the trajectory solution set generated by this method can reach 90.07% of the former, while the single decision-making time is maintained at the millisecond level. In addition, this method also demonstrates a certain degree of generalization ability and can adapt to new untrained wind fields.

参考文献

[1]吴健发, 王宏伦, 黄宇.大跨时空任务背景下的太阳能无人机任务规划技术研究进展[J].航空学报, 2020, 41(03):64-84
[2]J.F. Wu,H-L. Wang,Y. Huang,Progress in research on solar unmanned aerial vehicle mission planning tech-nology in the context of large-scale spatiotemporal mis-sions[J].Acta Aeronautica et Astronautica Sinica, 2020, 41(03):64-84
[3]高显忠, 邓小龙, 王玉杰, 等.临近空间太阳能飞机能量最优飞行航迹规划方法展望[J].航空学报, 2023, 44(08):6-27
[4]Gao Xianzhong, Deng Xiaolong, Wang Yujie, et al.Out-look on Energy Optimal Flight Path Planning Methods for Near Space Solar Aircraft[J].Acta Aeronautica et Astronautica Sinica, 2023, 44(08):6-27
[5]A. Klesh and P. Kabamba.Energy-Optimal Path Planning for Solar-Powered Aircraft in Level Flight[J].AIAA Journal, 2007, :-
[6]EDWARDS D J, KAHN A D, KELLY M, et al.Maxim-izing net power in circular turns for solar and autono-mous soaring aircraft[J].Journal of Aircraft, 2016, 53(5):1237-1247
[7] A.Ailon, A path planning approach for unmanned solar-powered aerial vehicles[C]. 21thInternational Conference on Renewable Energies and Power Quality, July 2023, vol. 21
[8]S.C. Spangelo,E[J].G. Gilbert, “Power Optimization of Solar-Powered Aircraft with Specified Closed Ground Tracks, ” Journal of Aircraft., 2013, 50(1):232-238
[9]HUANG Y, CHEN J, WANG H,et al.A method of 3D path planning for solar-powered UAV with fixed target and solar tracking[J].Aerospace Science and Technology, 2019, (92):831-838
[10] Martin, R.Abraham, Nathaniel S. Gates, Andrew Ning, and John D. Hedengren. “Dynamic Optimization of High-Altitude Solar Aircraft Trajectories under Station-Keeping Constraints.” Journal of Guidance, Control, and Dynamics 42, no. 3 (2019): 538–52.
[11] Bolandhemmat, H., et al. (2019). Energy-Optimized Trajectory Planning for High Altitude Long Endurance (HALE) Aircraft. 18th European Control Conference (ECC), Naples, ITALY.
[12] SACHS G, LENZ J, HOLZAPFELF.Unlimited endurance performance of solar UAVs with minimal or zero electri-cal energy torage[C]//AIAA Guidance, Navigation, and Control Conference.Reston: AIAA, 2009:6013.
[13]GAO X Z, HOU Z X, GUO Z, et al.Joint optimization of battery mass and flight trajectory for high-altitude solar-powered aircraft[J].Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace En-gineering, 2014, 228(13):2439-2451
[14]王少奇, 马东立, 杨穆清, 等.高空太阳能无人机三维航迹优化[J].北京航空航天大学学报, 2019, 45(05):936-943
[15]Wang Shaoqi, Ma Dongli, Yang Muqing, et al.Three dimensional trajectory optimization of high-altitude solar powered unmanned aerial vehicles[J].Journal of Bei-hang University, 2019, 45(05):936-943
[16] J.Marriott, B. Tezel et al., 2020, " Trajectory Optimization of Solar-Powered High-Altitude Long Endurance Air-craft, " 2020 6th International Conference on Control, Au-tomation and Robotics (ICCAR).
[17] Ni W J, Bi Y, WU D, MA X P.Energy-optimal trajectory planning for solar-powered aircraft using soft actor-critic[J]. Chinese Journal of Aeronautics, 2022, 35(10), 337-353.
[18] SILVA P, BAN M, KRANJC N, et al.Harvesting high altitude wind energy for power production: The concept based on Magnus’ effect[J]. Applied Energy, 2013, 101:151-160.
[19] 刘多能.固定翼无人机动态滑翔机理与航迹优化研究[D].国防科学技术大学, 2016.
[20]Liu Duoneng.Research on Dynamic Gliding Mecha-nism and Trajectory Optimization of Fixed Wing Drones [D]. National University of Defense Technology, 2016( in Chinese).
[21] Richardson P L.Upwind Dynamic Soaring of Albatrosses and UAVs [J]. Progress in Oceanography, 2015, 130: 146~156.
[22] Sachs G, Lesch K, Knoll A.Optimal Control for Maxi-mum Energy Extraction From Wind Shear [C] // AIAA Guidance, Navigation, and Control Conference, Wash-ington, D.C., 1989, AIAA Paper 1989-3490: 556~564.
[23] Sachs G.Minimum Shear Wind Strength Required for Dynamic Soaring of Albatrosses [J]. IBIS, 2005, 147: 1~10.
[24] Liu DN, Hou ZX, Flight modeling and simulation for dynamic soaring with small unmanned air vehicles [J].PROCEEDINGS OF THE INSTITUTION OF MECHANICAL ENGINEERS PART G-JOURNAL OF AEROSPACE ENGINEERING, vol.231(4), pp.589-605.
[25]Gao X Z, Hou Z X, Guo Z, et al.The Influence of Wind Shear to the Performance of High-Altitude Solar-Powered Aircraft[J].Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace En-gineering, 2013, 228(9):1562-1573
[26] W.Zou, N. Li, F. An, K. Wang, and C. Dong, “A novel trajectories optimizing method for dynamic soaring based on deep reinforcement learning, ” Defence Technology, vol. 46, pp. 99–108, 2025, doi: https://doi.org/10.1016/j.dt.2024.12.007.
[27] Bower G C, Flanzer T C, Krooy I M.Conceptual Design of a Small UAV for Continuous Flight Over the Ocean [C] // 11th AIAA Aviation Technology, Integration, and Operations (ATIO) Conference, Virginia Beach, VA, 20-22 September, 2011, AIAA Paper 2011-7072.
[28]张云飞, 王宏伦, 张梦华, 等.基于强化学习的多能源动态滑翔航迹优化方法[J].西北工业大学学报, 2025, 43(01):128-139
[29]Zhang Yunfei, Wang Honglun, Zhang Menghua, et al.Multi energy dynamic soaring trajectory optimization method based on reinforcement learning[J].Journal of Northwestern Polytechnical University, 2025, 43(01):128-139
[30]刘思奇, 白俊强.结合动态滑翔技术的小型太阳能无人机飞行能量变化分析[J].西北工业大学学报, 2020, 38(01):48-57
[31]Liu Siqi, Bai Junqiang.Analysis of Flight Energy Varia-tion of Small Solar UAVs Using Dynamic Soaring Technology[J].Journal of Northwestern Polytechnical University, 2020, 38(01):48-57
[32] J.Xu, Y. Tian, P. Ma, D. Rus, S. Sueda, and W. Matusik, “Prediction-guided multi-objective reinforcement learning for continuous robot control, ” in 37th International Con-ference on Machine Learning, ICML 2020, Virtual, Online, 2020, pp. 10538–10547.
[33] N.Kemper, M. Heider, D. Pietruschka, and J. Hahner, “A comparative study of multi-objective and neuroevolution-ary-based reinforcement learning algorithms for optimiz-ing electric vehicle charging and load management, ” Ap-plied Energy, vol. 391, 2025, [Online]. Available: http://dx.doi.org/10.1016/j.apenergy.2025.125890
[34] Bencatel R, Sousa T J D, Girard A.Atmospheric Flow Field Models Applicable for Aircraft Endurance Exten-sion [J]. Progress in Aerospace Sciences, 2013, 61: 1~25.
[35] B.Etkin, Dynamics of Atmospheric Flight. Chelmsford, MA, USA: Courier Corporation, 2012.
[36] B.Keidel, “Auslegung und Simulation von Hoch-fliegenden Dauerhaft Stationierbaren Solardrohnen, ” Ph.D. München: Technischen Universit?t München Fakult?t für Maschinenwesen, 2000.
[37] 昌敏.广纬度域驻留太阳能飞机设计及其动力学特性研究[D].中国西安:西北工业大学, 2013:12-15.
[38]Chang M.Design and dynamic characteristics of a wide latitude resident solar powered aircraft [D]. Xi' an, China: Northwestern Polytechnical University, 2013:12-15 (in Chinese).
[39] M.Asselin. An Introduction to Aircraft Performance[M]. AIAA: Reston, VA, USA, 1997.
[40] H.Lu, D. Herman, and Y. Yu. Multi-objective reinforce-ment learning: convexity, stationarity and pareto optimali-ty[C]//11th International Conference on Learning Repre-sentations, ICLR 2023, Kigali, Rwanda, 2023
[41] F.Felten, E.-G. Talbi, and G. Danoy. Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework[J/OL]. arXiv, 2023. Available: http://dx.doi.org/10.48550/arXiv.2311.12495
[42] Haarnoja, Tuomas, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, et al.Soft Actor-Critic Algorithms and Applications[J/OL]. ArXiv, 2018.https://doi.org/10.48550/arXiv.1812.05905
[43] Z.-Y. Xi et al. Energy-Optimized Trajectory Planning for Solar-Powered Aircraft in a Wind Field Using Rein-forcement Learning[J]. IEEE Access, 2022, 10: 87715-87732.
[44]刘思奇, 白俊强.基于六自由度模型的高空动态滑翔探究[J].西北工业大学学报, 2021, 39(04):703-711
[45]Liu Siqi, Bai Junqiang Exploration of high-altitude dy-namic gliding based on a six degree of freedom model [J].Journal of Northwestern Polytechnical University, 2021, 39 (04): 703-711
[46] S.Guo and Z. Xiaohui, “Multi-agent deep reinforcement learning based transmission latency minimization for de-lay-sensitive cognitive satellite UAV networks, ” IEEE Trans. Commun., vol. 71, no. 1, pp. 131–144, 2022.
[47] S.Park, J. Deyst, and J. P. How, A new nonlinear guid-ance logic for trajectory tracking[J/OL]. AIAA Guidance, Navigation, and Control Conference, Providence, RI, United states, 2004, pp. 941–956. http://dx.doi.org/10.2514/6.2004-4900
[48] K.Deb, A. Pratap, S. Agarwal, and T. Meyarivan, A fast and elitist multi-objective genetic algorithm: NSGA-II[J]. IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 182–197, 2002, doi: 10.1109/4235.996017.
文章导航

/