基于舰面环境模型的舰载机保障作业调度算法

  • 罗祎喆 ,
  • 王佳宝 ,
  • 余新得 ,
  • 陈旭东 ,
  • 金钊 ,
  • 冯硕 ,
  • 石育澄 ,
  • 徐明亮
展开
  • 1. 郑州大学 计算机与人工智能学院
    2. 郑州大学计算机与人工智能学院
    3. 郑州大学

收稿日期: 2025-12-03

  修回日期: 2026-05-11

  网络出版日期: 2026-05-14

基金资助

国家自然科学基金委员会;国家自然科学基金委员会;国家自然科学基金委员会;国家自然科学基金委员会;国家自然科学基金委员会

A Scheduling Algorithm for Carrier-Based Aircraft Support Operations Based on the Deck Environment Model

  • LUO Yi-Zhe ,
  • WANG Jia-Bao ,
  • YU Xin-De ,
  • CHEN Xu-Dong ,
  • JIN Zhao ,
  • FENG Shuo ,
  • SHI Yu-Cheng ,
  • XU Ming-Liang
Expand

Received date: 2025-12-03

  Revised date: 2026-05-11

  Online published: 2026-05-14

Supported by

National Natural Science Foundation of china;National Natural Science Foundation of china;National Natural Science Foundation of china;National Natural Science Foundation of china;National Natural Science Foundation of china

摘要

在航母舰载机保障作业调度中,无模型强化学习(Model-free Reinforcement Learning, MFRL)在动态甲板场景下受到物理环境建模精度的制约,而基于模型的强化学习(Model-based Reinforcement Learning, MBRL)因环境模型与决策模型在迭代训练中存在相互依赖的协同优化问题,面临计算复杂度高与收敛困难的挑战。对此,本文提出了一种融合有模型与无模型特性的混合强化学习框架(MB-MF)。首先,利用历史调度数据训练基于深度神经网络的甲板环境模型,使其在最小容差范围内精确预测状态转移;然后,将收敛后的环境模型替代真实环境,内嵌入交互环境中,结合深度Q网络(Deep Q-Network, DQN)算法训练调度智能体,实现环境模型学习与策略优化的解耦;最后,经实验验证表明,与使用物理环境的MFRL相比,本方法在无需精确建模的情况下性能差距仅为4%。而相较于MBRL基线方法,舰载机出动时间则缩短34%。同时在资源受限场景中,决策速度较启发式方法提高近300倍,而调度质量仅降低17%。

本文引用格式

罗祎喆 , 王佳宝 , 余新得 , 陈旭东 , 金钊 , 冯硕 , 石育澄 , 徐明亮 . 基于舰面环境模型的舰载机保障作业调度算法[J]. 航空学报, 0 : 1 -0 . DOI: 10.7527/S1000-6893.2026.33180

Abstract

In aircraft carrier flight deck scheduling operations, model-free reinforcement learning (MFRL) is constrained by the precision of physical environment modeling under dynamic deck conditions. In contrast, model-based reinforcement learning (MBRL) faces challenges of high computational complexity and convergence difficulties due to the interdependent co-optimization between the environment model and the decision-making model during iterative training. To address these issues, this paper proposes a hybrid reinforcement learning framework (MB-MF) that integrates model-based and model-free characteristics. First, a deep neural network-based deck environment model is trained using historical scheduling data to accurately predict state transitions within a minimal tolerance range. Then, the converged environment model is embedded into the interactive environment in place of the real environment, where a scheduling agent is trained using the Deep Q-Network (DQN) algorithm, thereby decoupling environment model learning from policy optimization. Experimental results demonstrate that, compared to MFRL using the physical environment, the proposed method achieves a performance gap of only 4% without requiring precise modeling. Moreover, it reduces the aircraft sortie time by 34% relative to the MBRL baseline. In resource-constrained scenarios, the decision-making speed is nearly 300 times faster than that of heuristic methods, while the scheduling quality is reduced by only 17%.
文章导航

/