航空学报 > 2025, Vol. 46 Issue (13): 531333-531333   doi: 10.7527/S1000-6893.2024.31333

局部引导强化学习的舰载机自主调运方法

王政1, 王华1,2,3(), 崔可可1, 李超超1,2,3, 刘俊楠1,2,3, 徐明亮1,2,3   

  1. 1.郑州大学 计算机与人工智能学院,郑州 450001
    2.智能集群系统教育部工程研究中心,郑州 450001
    3.国家超级计算郑州中心,郑州 450001
  • 收稿日期:2024-10-08 修回日期:2025-01-02 接受日期:2025-02-26 出版日期:2025-03-21 发布日期:2025-03-12
  • 通讯作者: 王华 E-mail:iewanghua@zzu.edu.cn
  • 基金资助:
    国家自然科学基金(62325602);国家自然科学基金(62036010);国家自然科学基金(62472389);河南省自然科学基金(252300421058)

Locally guided reinforcement learning for autonomous dispatching of carrier-based aircraft

Zheng WANG1, Hua WANG1,2,3(), Keke CUI1, Chaochao LI1,2,3, Junnan LIU1,2,3, Mingliang XU1,2,3   

  1. 1.School of Computer and Artificial Intelligence,Zhengzhou University,Zhengzhou 450001,China
    2.Engineering Research Center of Intelligent Swarm Systems,Ministry of Education,Zhengzhou 450001,China
    3.National Supercomputing Center in Zhengzhou,Zhengzhou 450001,China
  • Received:2024-10-08 Revised:2025-01-02 Accepted:2025-02-26 Online:2025-03-21 Published:2025-03-12
  • Contact: Hua WANG E-mail:iewanghua@zzu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62325602);Natural Science Foundation of Henan Province(252300421058)

摘要:

甲板空间有限且环境动态多变使得舰载机自主调运存在较大的挑战。现有基于强化学习的自动泊车技术为舰载机自主调运提供了新的技术思路,但上述方法直接用于舰载机这一动态环境且姿态受限下的自主调运时,存在不收敛的问题。鉴于此,提出了一种局部引导强化学习的舰载机自主调运方法,通过引入基于参考轨迹的局部目标状态奖励和调运终点附近的局部状态网格奖励来引导舰载机学习过程,避免了训练过程中出现局部最优解和收敛失败的问题,从而显著提升了舰载机自主调运成功率。实验结果表明,所提出的自主调运方法在成功率、安全性方面均优于传统的自主调运方法,并已在多种任务场景和不同数量的舰载机配置下得到了验证。

关键词: 舰载机, 自主调运, 深度强化学习, 局部目标状态, 局部状态网格

Abstract:

The limited deck space and highly dynamic environment pose significant challenges for autonomous dispatching of carrier-based aircraft. While existing reinforcement learning-based automatic parking techniques offer novel technological insights for autonomous carrier aircraft dispatching, these methods encounter non-convergence issues when directly applied to dynamic environments with constrained aircraft postures. To address this limitation, this paper proposes a locally guided reinforcement learning approach for carrier aircraft autonomous dispatching. The method introduces dual reward mechanisms: a reference trajectory-based local target state reward and a local state grid reward near the dispatching endpoint. These mechanisms effectively guide the learning process, preventing both local optima entrapment and convergence failure during training, thereby significantly enhancing the success rate of autonomous carrier aircraft dispatching. Experimental results demonstrate that the proposed approach outperforms conventional autonomous dispatching methods in terms of both success rate and operational safety. The method’‍s effectiveness has been validated in various mission scenarios and different carrier aircraft configurations.

Key words: carrier-based aircraft, autonomous dispatching, deep reinforcement learning, local target state, local state grid

中图分类号: