局部引导强化学习的舰载机自主调运方法

王政; 王华; 崔可可; 李超超; 刘俊楠; 徐明亮

doi:10.7527/S1000-6893.2024.31333

航空学报 >

2025 , Vol. 46 >Issue 13: 531333 - 531333

DOI: https://doi.org/10.7527/S1000-6893.2024.31333

论文

局部引导强化学习的舰载机自主调运方法

王政 ,
王华 ,
崔可可 ,
李超超 ,
刘俊楠 ,
徐明亮

展开

^1.郑州大学计算机与人工智能学院，郑州 450001
^2.智能集群系统教育部工程研究中心，郑州 450001
^3.国家超级计算郑州中心，郑州 450001

．E-mail： iewanghua@zzu.edu.cn

收稿日期: 2024-10-08

修回日期: 2025-01-02

录用日期: 2025-02-26

网络出版日期: 2025-03-12

基金资助

国家自然科学基金(62325602);国家自然科学基金(62036010);国家自然科学基金(62472389);河南省自然科学基金(252300421058)

收起

Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft

Zheng WANG ,
Hua WANG ,
Keke CUI ,
Chaochao LI ,
Junnan LIU ,
Mingliang XU

Expand

^1.School of Computer and Artificial Intelligence，Zhengzhou University，Zhengzhou 450001，China
^2.Engineering Research Center of Intelligent Swarm Systems，Ministry of Education，Zhengzhou 450001，China
^3.National Supercomputing Center in Zhengzhou，Zhengzhou 450001，China

E-mail： iewanghua@zzu.edu.cn

Received date: 2024-10-08

Revised date: 2025-01-02

Accepted date: 2025-02-26

Online published: 2025-03-12

Supported by

National Natural Science Foundation of China(62325602);Natural Science Foundation of Henan Province(252300421058)

Fold

摘要

甲板空间有限且环境动态多变使得舰载机自主调运存在较大的挑战。现有基于强化学习的自动泊车技术为舰载机自主调运提供了新的技术思路，但上述方法直接用于舰载机这一动态环境且姿态受限下的自主调运时，存在不收敛的问题。鉴于此，提出了一种局部引导强化学习的舰载机自主调运方法，通过引入基于参考轨迹的局部目标状态奖励和调运终点附近的局部状态网格奖励来引导舰载机学习过程，避免了训练过程中出现局部最优解和收敛失败的问题，从而显著提升了舰载机自主调运成功率。实验结果表明，所提出的自主调运方法在成功率、安全性方面均优于传统的自主调运方法，并已在多种任务场景和不同数量的舰载机配置下得到了验证。

关键词： 舰载机; 自主调运; 深度强化学习; 局部目标状态; 局部状态网格

本文引用格式

王政 , 王华 , 崔可可 , 李超超 , 刘俊楠 , 徐明亮 . 局部引导强化学习的舰载机自主调运方法[J]. 航空学报, 2025 , 46(13) : 531333 -531333 . DOI: 10.7527/S1000-6893.2024.31333

Abstract

The limited deck space and highly dynamic environment pose significant challenges for autonomous dispatching of carrier-based aircraft. While existing reinforcement learning-based automatic parking techniques offer novel technological insights for autonomous carrier aircraft dispatching， these methods encounter non-convergence issues when directly applied to dynamic environments with constrained aircraft postures. To address this limitation， this paper proposes a locally guided reinforcement learning approach for carrier aircraft autonomous dispatching. The method introduces dual reward mechanisms： a reference trajectory-based local target state reward and a local state grid reward near the dispatching endpoint. These mechanisms effectively guide the learning process， preventing both local optima entrapment and convergence failure during training， thereby significantly enhancing the success rate of autonomous carrier aircraft dispatching. Experimental results demonstrate that the proposed approach outperforms conventional autonomous dispatching methods in terms of both success rate and operational safety. The method’‍s effectiveness has been validated in various mission scenarios and different carrier aircraft configurations.

Key words： carrier-based aircraft; autonomous dispatching; deep reinforcement learning; local target state; local state grid

参考文献

[1]	WANG X W， LIU J， SU X C， et al. A review on carrier aircraft dispatch path planning and control on deck?［J］. Chinese Journal of Aeronautics， 2020， 33（12）： 3039-3057.
[2]	刘洁，韩维，徐卫国，等. 基于滚动时域的舰载机甲板运动轨迹跟踪最优控制［J］. 航空学报?， 2019， 40（8）： 175-200.
	LIU J， HAN W， XU W G， et al. Optimal path tracking control of carrier-based aircraft on the deck based on RHC?［J］. Acta Aeronautica et Astronautica Sinica， 2019， 40（8）： 175-200 （in Chinese）.
[3]	LI Y T， WU Y， SU X C， et al. Path planning for aircraft fleet launching on the flight deck of carriers［J］. Mathematics， 2018， 6（10）： 175.
[4]	SU X C， LI Z Y， SONG J Y， et al. A path planning method for carrier aircraft on deck combining artificial experience and intelligent search［J］. IOP Conference Series： Materials Science and Engineering， 2018， 381： 012194.
[5]	ZHANG P Z， XIONG L， YU Z P， et al. Reinforcement learning-based end-to-end parking for automatic parking system［J］. Sensors， 2019， 19（18）： 3996.
[6]	SONG S Y， CHEN H， SUN H W， et al. Data efficient reinforcement learning for integrated lateral planning and control in automated parking system［J］. Sensors， 2020， 20（24）： 7297.
[7]	CHEN S Y， WANG M L， YANG Y， et al. Conflict-constrained multi-agent reinforcement learning method for parking trajectory planning［C］?∥2023 IEEE International Conference on Robotics and Automation （ICRA）. Piscataway： IEEE Press， 2023.
[8]	CHAI R Q， TSOURDOS A， SAVVARIS A， et al. Design and implementation of deep neural network-based control for automatic parking maneuver process?［J］. IEEE Transactions on Neural Networks and Learning Systems， 2022， 33（4）： 1400-1413.
[9]	CHAI R Q， LIU D R， LIU T H， et al. Deep learning-based trajectory planning and control for autonomous ground vehicle parking maneuver［J］. IEEE Transactions on Automation Science and Engineering， 2023， 20（3）： 1633-1647.
[10]	张智，林圣琳，朱齐丹，等. 考虑运动学约束的不规则目标遗传避碰规划算法［J］. 航空学报， 2015， 36（4）： 1348-1358.
	ZHANG Z， LIN S L， ZHU Q D， et al. Genetic collision avoidance planning algorithm for irregular shaped object with kinematics constraint［J］. Acta Aeronautica et Astronautica Sinica， 2015， 36（4）： 1348-1358 （in Chinese）.
[11]	ZHANG J， YU J， QU X J， et al. Path planning for carrier aircraft based on geometry and Dijkstra’s algorithm［C］?∥2017 3rd IEEE International Conference on Control Science and Systems Engineering （ICCSSE）. Piscataway： IEEE Press， 2017.
[12]	WU Y， QU X J. Obstacle avoidance and path planning for carrier aircraft launching［J］. Chinese Journal of Aeronautics， 2015， 28（3）： 695-703.
[13]	薛均晓，孔祥燕，郭毅博，等. 基于深度强化学习的舰载机动态避障方法［J］. 计算机辅助设计与图形学学报， 2021， 33（7）： 1102-1112.
	XUE J X， KONG X Y， GUO Y B， et al. Dynamic obstacle avoidance method for carrier aircraft based on deep reinforcement learning?［J］. Journal of Computer-Aided Design & Computer Graphics， 2021， 33（7）： 1102-1112 （in Chinese）.
[14]	SHANG Z H， MAO Z Q， ZHANG H C， et al. Collaborative path planning of multiple carrier-based aircraft based on multi-agent reinforcement learning［C］?∥2022 23rd IEEE International Conference on Mobile Data Management （MDM）. Piscataway： IEEE Press， 2022.
[15]	LIU J， DONG X Z， WANG X W， et al. A homogenization-planning-tracking method to solve cooperative autonomous motion control for heterogeneous carrier dispatch systems?［J］. Chinese Journal of Aeronautics， 2022， 35（9）： 293-305.
[16]	WANG X W， PENG H J， LIU J， et al. Optimal control based coordinated taxiing path planning and tracking for multiple carrier aircraft on flight deck?［J］. Defence Technology， 2022， 18（2）： 238-248.
[17]	LIU J， HAN W， WANG X W， et al. Research on cooperative trajectory planning and tracking problem for multiple carrier aircraft on the deck［J］. IEEE Systems Journal， 2020， 14（2）： 3027-3038.
[18]	LIU J， HAN W， LIU C， et al. A new method for the optimal control problem of path planning for unmanned ground systems?［J］. IEEE Access， 2018， 6： 33251-33260.
[19]	SUTTON R S， BARTO A G. Reinforcement learning： An introduction?［M］. Cambridge： MIT Press， 2018： 53-80.
[20]	DOLGOV D， THRUN S， MONTEMERLO M， et al. Practical search techniques in path planning for autonomous driving［C］?∥AAAI Workshop-Technical Report. Menlo Park： AAAI， 2008.
[21]	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms?［DB/OL］. arXiv preprint： 1707. 06347， 2017.
[22]	BENGIO Y， LOURADOUR J， COLLOBERT R， et al. Curriculum learning［C］?∥Proceedings of the 26th Annual International Conference on Machine Learning. New York： ACM， 2009.
[23]	JULIANI A， BERGES V P， TENG E， et al. Unity： A general platform for intelligent agents［DB/OL］. arXiv preprint： 1809. 02627， 2018.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献