基于深度强化学习的太阳能无人机航迹规划

doi:10.7527/S1000-6893.2024.31420

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 |

基于深度强化学习的太阳能无人机航迹规划

余子杰¹, 郑征¹(), 李清东¹, 郭林², 任素萍², 郭健³

^1.北京航空航天大学自动化科学与电气工程学院，北京 100191
^2.中国航天空气动力技术研究院，北京 100074
^3.中国煤炭科工集团有限公司，北京 100028

收稿日期:2024-10-21 修回日期:2024-11-08 接受日期:2024-12-10 出版日期:2025-01-07 发布日期:2024-12-30
通讯作者: 郑征 E-mail:zhengz@buaa.edu.cn
基金资助:
国家自然科学基金(62372021)

Trajectory planning for solar-powered UAVs based on deep reinforcement learning

Zijie YU¹, Zheng ZHENG¹(), Qingdong LI¹, Lin GUO², Suping REN², Jian GUO³

^1.School of Automation Science and Electrical Engineering，Beihang University，Beijing 100191，China
^2.China Aerospace Aerodynamics Research Institute，Beijing 100074，China
^3.China Coal Science and Engineering Group Corporation，Beijing 100028，China

Received:2024-10-21 Revised:2024-11-08 Accepted:2024-12-10 Online:2025-01-07 Published:2024-12-30
Contact: Zheng ZHENG E-mail:zhengz@buaa.edu.cn
Supported by:
National Natural Science Foundation of China(62372021)

摘要/Abstract

摘要：

高空长航时太阳能无人机（HALE-SUAV）通过合理的航迹规划可以极大提升其续航性能，而深度强化学习方法由于实时性与自适应性成为该航迹规划问题的理想选择。针对基于深度强化学习方法的HALE-SUAV航迹规划问题，建立了无人机的运动学与动力学模型以及能量相关模型，设计了其能量管理策略，搭建了该航迹规划问题的深度强化学习整体框架，并最终使用训练出来的模型进行了不同太阳能辐射强度情况下的航迹规划实验。研究结果表示基于所提的深度强化学习方法，HALE-SUAV能够选择基于当前太阳能辐射强度情况下合理的控制指令，以提高其续航性能。研究结果显示了深度强化学习方法在HALE-SUAV航迹规划问题的潜在应用价值。

关键词: 深度强化学习, 高空长航时太阳能无人机, 航迹规划, 续航性能, 能量管理策略

Abstract:

High Altitude Long Endurance Solar-powered Unmanned Aerial Vehicles （HALE-SUAV） can significantly enhance the endurance performance through well-designed trajectory planning. Deep Reinforcement Learning （DRL） methods are ideal for this trajectory planning problem due to their real-time performance and adaptability. To address the HALE-SUAV trajectory planning problem based on DRL， this paper establishes the kinematics and dynamics models of the UAV， along with energy-related models， designs its energy management strategy， constructs the overall DRL framework for this trajectory planning problem， and ultimately conducts trajectory planning experiments under different solar radiation intensities using the trained model. The research results indicate that， based on the DRL method proposed in this paper， HALE-SUAVs can select reasonable control commands based on current solar radiation intensities to improve their endurance performance. The findings demonstrate the potential application value of DRL methods in HALE-SUAV trajectory planning problems.

Key words: deep reinforcement learning, HALE-SUAV, trajectory planning, endurance performance, energy management strategy

中图分类号:

V324.2

余子杰, 郑征, 李清东, 郭林, 任素萍, 郭健. 基于深度强化学习的太阳能无人机航迹规划[J]. 航空学报, 2025, 46(12): 331420.

Zijie YU, Zheng ZHENG, Qingdong LI, Lin GUO, Suping REN, Jian GUO. Trajectory planning for solar-powered UAVs based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(12): 331420.

图/表 16

图 1

图 2

图 3

图 4

图 5

图 6

表1

HALE-SUAV相关参数

参数	符号	数值
总质量/kg	$m$	213
电池质量/kg	$m b e t t e r y$	141
最大飞行半径/km	$R m a x$	3
纵横比	$A R$	30
机翼面积/m²	$A$	60
翼展/m	$b$	42
电机效率	$η m o t o r$	0.95
最大飞行高度/m	$h m a x$	18 288
最小飞行高度/m	$h m i n$	24 288
最大攻角/ $°$	$α m a x$	10
最小攻角/ $°$	$α m i n$	0
最大倾斜角/ $°$	$ϕ m a x$	5
最小倾斜角/ $°$	$ϕ m i n$	-5
最大推力/N	$T p m a x$	500
最小推力/N	$T p m i n$	0
有效载荷质量/kg	$m p a y l o a d$	25
电池能量比/（（W·h）·kg^-1）	$U b a t t$	35
电池最大能量/（kW·h）	$E b a t t m a x$	59.5
最大放电功率/kW	$P b a t t e r y m a x$	11
最大充电功率/kW	$P b a t t e r y m i n$	-11
掠角/ $°$	$Λ$	17.5
有效载荷功率/W	$P p a y l o a d$	250
太阳能电池板面积/m²	$S$	60
螺旋桨半径/m	$r p r o p$	1.58

表1

表2

飞机升力面拟合系数值

参数	数值
$a 1$	$3.774 21 × 10 - 1$
$a 2$	$1.243 16 × 10 - 1$
$a 3$	$7.646 15 × 10 - 7$
$a 4$	$- 5.682 28 × 10 - 3$
$a 5$	$- 6.445 53 × 10 - 13$
$a 6$	$- 2.650 58 × 10 - 8$

表2

表3

飞机阻力面拟合系数值

参数	数值
$b 1$	$6.448 15 × 10 - 2$
$b 2$	$- 1.878 41 × 10 - 7$
$b 3$	$1.793 26 × 10 - 13$
$b 4$	$- 1.113 85 × 10 - 2$
$b 5$	$3.750 46 × 10 - 8$
$b 6$	$- 3.105 91 × 10 - 14$
$b 7$	$1.097 53 × 10 - 3$
$b 8$	$- 2.367 96 × 10 - 9$
$b 9$	$1.584 61 × 10 - 15$

表3

表4

SMARTS模型参数设置信息

参数	数值
仿真地点纬度/ $° N$	$35$
仿真地点经度/ $° W$	$106.6$
海拔高度/km	1.387
飞行高度/km	20
光谱范围/km	280~4 000
太阳辐照常数/km	1 366.1

表4

图 7

图 8

图 9

图 10

图 11

图 12

参考文献 27

[1]	CESTINO E. Design of solar high altitude long endurance aircraft for multi payload & operations‍［J］. Aerospace Science and Technology， 2006， 10（6）： 541-550.
[2]	RAJENDRAN P， SMITH H. Implications of longitude and latitude on the size of solar-powered UAV［J］. Energy Conversion and Management， 2015， 98： 107-114.
[3]	高显忠，邓小龙，王玉杰，等. 临近空间太阳能飞机能量最优飞行航迹规划方法展望［J］. 航空学报， 2023， 44（8）： 027265.
	GAO X Z， DENG X L， WANG Y J， et al. General planning method for energy optimal flight path of solar-powered aircraft in near space‍［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（8）： 027265 （in Chinese）.
[4]	姚远，戴雨可，徐一鸣. 考虑不确定性的太阳能无人机总体参数设计［J］. 航空学报， 2024， 45（17）： 529856.
	YAO Y， DAI Y K， XU Y M. Overall parameter design of solar UAV considering uncertainty［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（17）： 529856 （in Chinese）.
[5]	李广佳，王红波，张凯，等. 临近空间太阳能无人机增升减阻技术综述［J］. 航空学报， 2024， 45（5）： 529644.
	LI G J， WANG H B， ZHANG K， et al. Lift enhancement and drag reduction technologies of solar powered unmanned aerial vehicles in near space： Review［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（5）： 529644 （in Chinese）.
[6]	SHIAU J K， MA D M， CHIU C W， et al. Optimal sizing and cruise speed determination for a solar-powered airplane［J］. Journal of Aircraft， 2010， 47（2）： 622-629.
[7]	LI X H， SUN K J， LI F. General optimal design of solar-powered unmanned aerial vehicle for priority considering propulsion system‍［J］. Chinese Journal of Aeronautics， 2020， 33（8）： 2176-2188.
[8]	KLESH A， KABAMBA P. Energy-optimal path planning for solar-powered aircraft in level flight［C］∥AIAA Guidance， Navigation and Control Conference and Exhibit. Reston： AIAA， 2007.
[9]	HUANG Y， CHEN J G， WANG H L， et al. A method of 3D path planning for solar-powered UAV with fixed target and solar tracking‍［J］. Aerospace Science and Technology， 2019， 92： 831-838.
[10]	SPANGELO S C， GILBERT E G. Power optimization of solar-powered aircraft with specified closed ground tracks［J］. Journal of Aircraft， 2012， 50（1）： 232-238.
[11]	GAO X Z， HOU Z X， GUO Z， et al. Energy management strategy for solar-powered high-altitude long-endurance aircraft‍［J］. Energy Conversion and Management， 2013， 70： 20-30.
[12]	LEE J S， YU K H. Optimal path planning of solar-powered UAV using gravitational potential energy‍［J］. IEEE Transactions on Aerospace and Electronic Systems， 2017， 53（3）： 1442-1451.
[13]	SUN M， JI X Z， SUN K W， et al. Flight strategy optimization for high-altitude solar-powered aircraft based on gravity energy reserving and mission altitude［J］. Applied Sciences， 2020， 10（7）： 2243.
[14]	WANG X Y， YANG Y P， WU D， et al. Mission-oriented 3D path planning for high-altitude long-endurance solar-powered UAVs with optimal energy management‍［J］. IEEE Access， 2020， 8： 227629-227641.
[15]	CAMCI E， KAYACAN E. End-to-end motion planning of quadrotors using deep reinforcement learning‍［DB/OL］. arXiv preprint： 1909.13599， 2019.
[16]	GANDHI D， PINTO L， GUPTA A. Learning to fly by crashing［C］∥2017 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. Piscataway： IEEE Press， 2017.
[17]	NG A Y， COATES A， DIEL M， et al. Autonomous inverted helicopter flight via reinforcement learning［M］∥ Experimental Robotics IX. Berlin： Springer Berlin Heidelberg， 2006： 363-372.
[18]	CLARKE S G， HWANG I. Deep reinforcement learning control for aerobatic maneuvering of agile fixed-wing aircraft［C］∥AIAA Scitech 2020 Forum. Reston： AIAA， 2020.
[19]	REDDY G， WONG-NG J， CELANI A， et al. Glider soaring via reinforcement learning in the field‍［J］. Nature， 2018， 562（7726）： 236-239.
[20]	BELLEMARE M G， CANDIDO S， CASTRO P S， et al. Autonomous navigation of stratospheric balloons using reinforcement learning‍［J］. Nature， 2020， 588（7836）： 77-82.
[21]	NI W J， BI Y， WU D， et al. Energy-optimal trajectory planning for solar-powered aircraft using soft actor-critic［J］. Chinese Journal of Aeronautics， 2022， 35（10）： 337-353.
[22]	MARTIN R A， GATES N S， NING A， et al. Dynamic optimization of high-altitude solar aircraft trajectories under station-keeping constraints［J］. Journal of Guidance， Control， and Dynamics， 2018， 42（3）： 538-552.
[23]	邵嘉琪，张晓辉，席涵宇，等. 太阳能无人机线性自抗扰多环路能源控制［J］. 航空学报， 2023， 44（10）： 327812.
	SHAO J Q， ZHANG X H， XI H Y， et al. Multi-loop energy control method of linear active disturbance rejection for solar-powered UAVs［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（10）： 327812 （in Chinese）.
[24]	SUTTON R S， BARTO A G. Reinforcement learning： An introduction‍［M］. 2nd ed. Cambridge： The MIT Press； 1998： 62-67.
[25]	HAARNOJA T， ZHOU A， ABBEEL P， et al. Soft actor-critic： Off-policy maximum entropy deep reinforcement learning with a stochastic actor［DB/OL］. arXiv preprint： 1801.01290，2018.
[26]	HAARNOJA T， ZHOU A， HARTIKAINEN K， et al. Soft actor-critic algorithms and applications［DB/OL］. arXiv preprint： 1812.05905， 2018.
[27]	SISSENWINE N， DUBIN M， WEXLER H. The U.S. standard atmosphere‍［J］. Journal of Geophysical Research， 1962， 67（9）： 3627-3630.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

基于深度强化学习的太阳能无人机航迹规划

Trajectory planning for solar-powered UAVs based on deep reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 27

相关文章 15

编辑推荐

Metrics

本文评价

[1]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[2]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[3]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[4]	凌文辉, 牟春晖, 聂聆聪, 杜宪, 孙希明. 基于改进DDPG的宽速域几何可调燃烧室压力分布控制[J]. 航空学报, 2025, 46(12): 131092-131092.
[5]	王俊潼, 包丹文, 周佳怡, 尚静萱, 张孜芊. 低空空域规划研究现状与展望[J]. 航空学报, 2025, 46(11): 530879-530879.
[6]	高树一, 林德福, 郑多, 徐骋. 考虑拦截器探测能力限制的飞行器智能机动突防制导策略[J]. 航空学报, 2025, 46(10): 331304-331304.
[7]	张鸿林, 罗建军, 马卫华. 基于机器学习的航天器规避目标威胁博弈决策[J]. 航空学报, 2024, 45(8): 329136-329136.
[8]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[9]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.
[10]	胡玉梅, 潘泉, 邓豹. 基于Fisher信息的传感器航迹自适应滤波算法[J]. 航空学报, 2024, 45(20): 629825-629825.
[11]	高兵, 张哲婕, 邹启杰, 刘治国, 赵锡玲. 基于深度强化学习和信息论的多智能体通信方法[J]. 航空学报, 2024, 45(18): 329862-329862.
[12]	李佐龙, 朱纪洪, 匡敏驰, 张杰, 任洁. 基于混合动作的空战分层强化学习决策算法[J]. 航空学报, 2024, 45(17): 530053-530053.
[13]	邓舒豪, 雷涛, 金贤球, 陈俊祥, 黄代文, 张晓斌. 燃料电池无人机混合电源系统稳定性及功率控制方法[J]. 航空学报, 2024, 45(17): 530032-530032.
[14]	武天才, 王宏伦, 任斌, 刘一恒, 吴星雨, 严国乘. 考虑规避与突防的高超声速飞行器智能容错制导控制一体化设计[J]. 航空学报, 2024, 45(15): 329607-329607.
[15]	倪炜霖, 王永海, 徐聪, 赤丰华, 梁海朝. 基于强化学习的高超飞行器协同博弈制导方法[J]. 航空学报, 2023, 44(S2): 729400-729400.