局部引导强化学习的舰载机自主调运方法

doi:10.7527/S1000-6893.2024.31333

论文

本期目录 | 过刊浏览 | 高级检索

前一篇 |

局部引导强化学习的舰载机自主调运方法

王政¹, 王华¹^,²^,³(), 崔可可¹, 李超超¹^,²^,³, 刘俊楠¹^,²^,³, 徐明亮¹^,²^,³

^1.郑州大学计算机与人工智能学院，郑州 450001
^2.智能集群系统教育部工程研究中心，郑州 450001
^3.国家超级计算郑州中心，郑州 450001

收稿日期:2024-10-08 修回日期:2025-01-02 接受日期:2025-02-26 出版日期:2025-03-21 发布日期:2025-03-12
通讯作者: 王华 E-mail:iewanghua@zzu.edu.cn
基金资助:
国家自然科学基金(62325602);国家自然科学基金(62036010);国家自然科学基金(62472389);河南省自然科学基金(252300421058)

Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft

Zheng WANG¹, Hua WANG¹^,²^,³(), Keke CUI¹, Chaochao LI¹^,²^,³, Junnan LIU¹^,²^,³, Mingliang XU¹^,²^,³

^1.School of Computer and Artificial Intelligence，Zhengzhou University，Zhengzhou 450001，China
^2.Engineering Research Center of Intelligent Swarm Systems，Ministry of Education，Zhengzhou 450001，China
^3.National Supercomputing Center in Zhengzhou，Zhengzhou 450001，China

Received:2024-10-08 Revised:2025-01-02 Accepted:2025-02-26 Online:2025-03-21 Published:2025-03-12
Contact: Hua WANG E-mail:iewanghua@zzu.edu.cn
Supported by:
National Natural Science Foundation of China(62325602);Natural Science Foundation of Henan Province(252300421058)

摘要/Abstract

摘要：

甲板空间有限且环境动态多变使得舰载机自主调运存在较大的挑战。现有基于强化学习的自动泊车技术为舰载机自主调运提供了新的技术思路，但上述方法直接用于舰载机这一动态环境且姿态受限下的自主调运时，存在不收敛的问题。鉴于此，提出了一种局部引导强化学习的舰载机自主调运方法，通过引入基于参考轨迹的局部目标状态奖励和调运终点附近的局部状态网格奖励来引导舰载机学习过程，避免了训练过程中出现局部最优解和收敛失败的问题，从而显著提升了舰载机自主调运成功率。实验结果表明，所提出的自主调运方法在成功率、安全性方面均优于传统的自主调运方法，并已在多种任务场景和不同数量的舰载机配置下得到了验证。

关键词: 舰载机, 自主调运, 深度强化学习, 局部目标状态, 局部状态网格

Abstract:

The limited deck space and highly dynamic environment pose significant challenges for autonomous dispatching of carrier-based aircraft. While existing reinforcement learning-based automatic parking techniques offer novel technological insights for autonomous carrier aircraft dispatching， these methods encounter non-convergence issues when directly applied to dynamic environments with constrained aircraft postures. To address this limitation， this paper proposes a locally guided reinforcement learning approach for carrier aircraft autonomous dispatching. The method introduces dual reward mechanisms： a reference trajectory-based local target state reward and a local state grid reward near the dispatching endpoint. These mechanisms effectively guide the learning process， preventing both local optima entrapment and convergence failure during training， thereby significantly enhancing the success rate of autonomous carrier aircraft dispatching. Experimental results demonstrate that the proposed approach outperforms conventional autonomous dispatching methods in terms of both success rate and operational safety. The method’‍s effectiveness has been validated in various mission scenarios and different carrier aircraft configurations.

Key words: carrier-based aircraft, autonomous dispatching, deep reinforcement learning, local target state, local state grid

中图分类号:

王政, 王华, 崔可可, 李超超, 刘俊楠, 徐明亮. 局部引导强化学习的舰载机自主调运方法[J]. 航空学报, 2025, 46(13): 531333.

Zheng WANG, Hua WANG, Keke CUI, Chaochao LI, Junnan LIU, Mingliang XU. Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(13): 531333.

图/表 15

图 1

图 2

图 3

表1

强化学习训练过程中事件奖励

奖励参数	数值	奖励参数	数值
$R e n d d$	1	$R v$	-0.000 15
$R e n d \| v \|$	0.5	$R o$	-1
$R e n d θ$	0.5	$R l$	-0.000 3
$R l o c d$	0.01	$R f l o c$	-0.000 25
$R l o c \| v \|$	0.005	$R u$	-0.002
$R l o c θ$	0.005	$R c l o c$	-0.000 75

表1

图 4

表2

本文方法的超参数值

参数	数值	描述
$F m a x$ /N	30 000	最大驱动力
$m$ /kg	15 000	舰载机质量
$β m a x$ /（°）	30	最大转向角
$v m a x$ /（m·s^-1）	5	最大调运速度
$w$ /m	31	局部状态网格宽度
$r$	15	单元格数
$N u m$	24	单元格中的角度步长
$T / s$	0.004	模拟步骤
学习率	3×10^-4	用于梯度下降更新
$γ$	0.99	折扣因子
$H$	15 000	每段最大步数
Epochs	3	训练轮数
Batch Size	1 024	批大小
缓冲区大小	10 240	缓冲区大小
$β$	5×10^-3	熵正则化强度
$ϵ$	0.2	偏差阈值

表2

图 5

表3

表4

图 6

图 7

图 8

表5

表6

不同局部状态网格数量下调运模拟结果

$w$	碰撞次数	成功率/%
$11$	2	62.50
$21$	0	75.00
$31$	0	87.50
$41$	0	87.50

表6

表7

不同角度步长下调运模拟结果

$N u m$	碰撞次数	成功率/%
6	3	50.00
12	0	75.00
24	0	87.50
36	0	87.50

表7

参考文献 23

[1]	WANG X W， LIU J， SU X C， et al. A review on carrier aircraft dispatch path planning and control on deck‍［J］. Chinese Journal of Aeronautics， 2020， 33（12）： 3039-3057.
[2]	刘洁，韩维，徐卫国，等. 基于滚动时域的舰载机甲板运动轨迹跟踪最优控制［J］. 航空学报‍， 2019， 40（8）： 175-200.
	LIU J， HAN W， XU W G， et al. Optimal path tracking control of carrier-based aircraft on the deck based on RHC‍［J］. Acta Aeronautica et Astronautica Sinica， 2019， 40（8）： 175-200 （in Chinese）.
[3]	LI Y T， WU Y， SU X C， et al. Path planning for aircraft fleet launching on the flight deck of carriers［J］. Mathematics， 2018， 6（10）： 175.
[4]	SU X C， LI Z Y， SONG J Y， et al. A path planning method for carrier aircraft on deck combining artificial experience and intelligent search［J］. IOP Conference Series： Materials Science and Engineering， 2018， 381： 012194.
[5]	ZHANG P Z， XIONG L， YU Z P， et al. Reinforcement learning-based end-to-end parking for automatic parking system［J］. Sensors， 2019， 19（18）： 3996.
[6]	SONG S Y， CHEN H， SUN H W， et al. Data efficient reinforcement learning for integrated lateral planning and control in automated parking system［J］. Sensors， 2020， 20（24）： 7297.
[7]	CHEN S Y， WANG M L， YANG Y， et al. Conflict-constrained multi-agent reinforcement learning method for parking trajectory planning［C］‍∥2023 IEEE International Conference on Robotics and Automation （ICRA）. Piscataway： IEEE Press， 2023.
[8]	CHAI R Q， TSOURDOS A， SAVVARIS A， et al. Design and implementation of deep neural network-based control for automatic parking maneuver process‍［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022， 33（4）： 1400-1413.
[9]	CHAI R Q， LIU D R， LIU T H， et al. Deep learning-based trajectory planning and control for autonomous ground vehicle parking maneuver［J］. IEEE Transactions on Automation Science and Engineering， 2023， 20（3）： 1633-1647.
[10]	张智，林圣琳，朱齐丹，等. 考虑运动学约束的不规则目标遗传避碰规划算法［J］. 航空学报， 2015， 36（4）： 1348-1358.
	ZHANG Z， LIN S L， ZHU Q D， et al. Genetic collision avoidance planning algorithm for irregular shaped object with kinematics constraint［J］. Acta Aeronautica et Astronautica Sinica， 2015， 36（4）： 1348-1358 （in Chinese）.
[11]	ZHANG J， YU J， QU X J， et al. Path planning for carrier aircraft based on geometry and Dijkstra’s algorithm［C］‍∥2017 3rd IEEE International Conference on Control Science and Systems Engineering （ICCSSE）. Piscataway： IEEE Press， 2017.
[12]	WU Y， QU X J. Obstacle avoidance and path planning for carrier aircraft launching［J］. Chinese Journal of Aeronautics， 2015， 28（3）： 695-703.
[13]	薛均晓，孔祥燕，郭毅博，等. 基于深度强化学习的舰载机动态避障方法［J］. 计算机辅助设计与图形学学报， 2021， 33（7）： 1102-1112.
	XUE J X， KONG X Y， GUO Y B， et al. Dynamic obstacle avoidance method for carrier aircraft based on deep reinforcement learning‍［J］. Journal of Computer-Aided Design & Computer Graphics， 2021， 33（7）： 1102-1112 （in Chinese）.
[14]	SHANG Z H， MAO Z Q， ZHANG H C， et al. Collaborative path planning of multiple carrier-based aircraft based on multi-agent reinforcement learning［C］‍∥2022 23rd IEEE International Conference on Mobile Data Management （MDM）. Piscataway： IEEE Press， 2022.
[15]	LIU J， DONG X Z， WANG X W， et al. A homogenization-planning-tracking method to solve cooperative autonomous motion control for heterogeneous carrier dispatch systems‍［J］. Chinese Journal of Aeronautics， 2022， 35（9）： 293-305.
[16]	WANG X W， PENG H J， LIU J， et al. Optimal control based coordinated taxiing path planning and tracking for multiple carrier aircraft on flight deck‍［J］. Defence Technology， 2022， 18（2）： 238-248.
[17]	LIU J， HAN W， WANG X W， et al. Research on cooperative trajectory planning and tracking problem for multiple carrier aircraft on the deck［J］. IEEE Systems Journal， 2020， 14（2）： 3027-3038.
[18]	LIU J， HAN W， LIU C， et al. A new method for the optimal control problem of path planning for unmanned ground systems‍［J］. IEEE Access， 2018， 6： 33251-33260.
[19]	SUTTON R S， BARTO A G. Reinforcement learning： An introduction‍［M］. Cambridge： MIT Press， 2018： 53-80.
[20]	DOLGOV D， THRUN S， MONTEMERLO M， et al. Practical search techniques in path planning for autonomous driving［C］‍∥AAAI Workshop-Technical Report. Menlo Park： AAAI， 2008.
[21]	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms‍［DB/OL］. arXiv preprint： 1707. 06347， 2017.
[22]	BENGIO Y， LOURADOUR J， COLLOBERT R， et al. Curriculum learning［C］‍∥Proceedings of the 26th Annual International Conference on Machine Learning. New York： ACM， 2009.
[23]	JULIANI A， BERGES V P， TENG E， et al. Unity： A general platform for intelligent agents［DB/OL］. arXiv preprint： 1809. 02627， 2018.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

起始状态		终点状态		起始时间1/s	起始时间2/s	起始时间3/s	起始时间4/s
坐标	角度/（°）	坐标	角度/（°）	起始时间1/s	起始时间2/s	起始时间3/s	起始时间4/s
（323.6，133.1）	334.1	（229.1，129.65）	264.3	3	5	4	5
（217.9，164.9）	202.3	（124.1，163.8）	265.5	6	0	3	6
（302.5，130.7）	340.2	（241.8，116.2）	269.1	1	1	2	6
（241.2，167.9）	192.0	（229.1，129.65）	264.3	2	0	1	1
（275.1，115.4）	0.8	（241.8，116.2）	269.1	6	2	5	6
（190.7，177.9）	178.4	（229.1，129.65）	264.3	6	0	1	1
（142.8，178.7）	178.4	（130.5，143.9）	267.5	6	3	2	2
（337.3，159.8）	257.7	（124.1，163.8）	265.5	6	2	1	4

局部引导强化学习的舰载机自主调运方法

Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 23

相关文章 15

编辑推荐

Metrics

本文评价

方法	碰撞次数	成功率/%
Dubins-RVO	2	62.50
Dubins-RVO-PID	4	50.00
HPT	5	59.38
AVP	3	56.25
本文方法	0	87.50

[1]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[2]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[3]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[4]	吕晓晨, 史静平, 吕永玺, 李耕农. 传感器失效下的魔毯着舰气流角重构算法[J]. 航空学报, 2025, 46(13): 531159-531159.
[5]	郭放, 韩维, 刘玉杰, 刘洁, 苏析超, 程亮亮. 基于可变作业流程的舰载机机务勤务保障作业调度[J]. 航空学报, 2025, 46(13): 531195-531195.
[6]	陈伟, 李璐璐, 陈董, 张少辉, 李亚飞, 王可, 靳远远, 徐明亮. 差异化保障需求驱动的舰载机多机协同决策方法[J]. 航空学报, 2025, 46(13): 331274-331274.
[7]	陈旭东, 陈琦琦, 罗祎喆, 王佳宝, 徐明亮. 异构舰载机舰面保障作业动态并行调度[J]. 航空学报, 2025, 46(13): 531329-531329.
[8]	凌文辉, 牟春晖, 聂聆聪, 杜宪, 孙希明. 基于改进DDPG的宽速域几何可调燃烧室压力分布控制[J]. 航空学报, 2025, 46(12): 131092-131092.
[9]	余子杰, 郑征, 李清东, 郭林, 任素萍, 郭健. 基于深度强化学习的太阳能无人机航迹规划[J]. 航空学报, 2025, 46(12): 331420-331420.
[10]	高树一, 林德福, 郑多, 徐骋. 考虑拦截器探测能力限制的飞行器智能机动突防制导策略[J]. 航空学报, 2025, 46(10): 331304-331304.
[11]	刘广, 王华, 林友芳, 贺硕, 李亚飞, 徐明亮. 舰载机保障作业自适应批量匹配决策方法[J]. 航空学报, 2025, 46(1): 330615-330615.
[12]	周大鹏, 曲晓雷. 基于知识引导智能鸽群优化的舰载机着舰控制[J]. 航空学报, 2024, 45(S1): 730801-730801.
[13]	张鸿林, 罗建军, 马卫华. 基于机器学习的航天器规避目标威胁博弈决策[J]. 航空学报, 2024, 45(8): 329136-329136.
[14]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[15]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.