在航母舰载机保障作业调度中,无模型强化学习(Model-free Reinforcement Learning, MFRL)在动态甲板场景下受到物理环境建模精度的制约,而基于模型的强化学习(Model-based Reinforcement Learning, MBRL)因环境模型与决策模型在迭代训练中存在相互依赖的协同优化问题,面临计算复杂度高与收敛困难的挑战。对此,本文提出了一种融合有模型与无模型特性的混合强化学习框架(MB-MF)。首先,利用历史调度数据训练基于深度神经网络的甲板环境模型,使其在最小容差范围内精确预测状态转移;然后,将收敛后的环境模型替代真实环境,内嵌入交互环境中,结合深度Q网络(Deep Q-Network, DQN)算法训练调度智能体,实现环境模型学习与策略优化的解耦;最后,经实验验证表明,与使用物理环境的MFRL相比,本方法在无需精确建模的情况下性能差距仅为4%。而相较于MBRL基线方法,舰载机出动时间则缩短34%。同时在资源受限场景中,决策速度较启发式方法提高近300倍,而调度质量仅降低17%。
In aircraft carrier flight deck scheduling operations, model-free reinforcement learning (MFRL) is constrained by the precision of physical environment modeling under dynamic deck conditions. In contrast, model-based reinforcement learning (MBRL) faces challenges of high computational complexity and convergence difficulties due to the interdependent co-optimization between the environment model and the decision-making model during iterative training. To address these issues, this paper proposes a hybrid reinforcement learning framework (MB-MF) that integrates model-based and model-free characteristics. First, a deep neural network-based deck environment model is trained using historical scheduling data to accurately predict state transitions within a minimal tolerance range. Then, the converged environment model is embedded into the interactive environment in place of the real environment, where a scheduling agent is trained using the Deep Q-Network (DQN) algorithm, thereby decoupling environment model learning from policy optimization. Experimental results demonstrate that, compared to MFRL using the physical environment, the proposed method achieves a performance gap of only 4% without requiring precise modeling. Moreover, it reduces the aircraft sortie time by 34% relative to the MBRL baseline. In resource-constrained scenarios, the decision-making speed is nearly 300 times faster than that of heuristic methods, while the scheduling quality is reduced by only 17%.