基于规划-校正分层强化学习的自主再入制导(航天运输系统自主制导与控制技术专栏)

  • 彭高祥 ,
  • 王博 ,
  • 刘磊 ,
  • 樊慧津
展开
  • 华中科技大学

收稿日期: 2025-06-27

  修回日期: 2025-11-09

  网络出版日期: 2025-11-10

基金资助

武汉市知识创新专项基础研究项目;国家自然科学基金

Autonomous reentry guidance based on planning-correction hierarchical reinforcement learning

  • PENG Gao-Xiang ,
  • WANG Bo ,
  • LIU Lei ,
  • FAN Hui-Jin
Expand
  • 1. Huazhong University of Science and Technology
    2.

Received date: 2025-06-27

  Revised date: 2025-11-09

  Online published: 2025-11-10

摘要

为增强空天飞行器再入过程的快速响应能力、任务适应性和对显著模型偏差的鲁棒性,提出了基于规划-校正分层强化学习的自主再入制导方法。针对传统分层强化学习的训练不平稳性问题,为消除上层策略训练对下层状态转移数据的依赖,提出规划-校正分层策略,建立双层制导框架。在规划层,采用模块化强化学习策略规划参考攻角剖面和倾侧角剖面,根据任务需求实现全局轨迹生成,确保制导框架的任务适应能力;在校正层,通过模型参数偏差下的高频轨迹校正,克服参数大偏差影响。仿真结果表明,双层制导策略能够克服更大的参数偏差,提升在大偏差情况下的制导精度。同时,与预测校正制导算法比较,双层制导策略展现了更强的任务适应性和实时性,可实现任意位置与方向的任务下自主制导。

本文引用格式

彭高祥 , 王博 , 刘磊 , 樊慧津 . 基于规划-校正分层强化学习的自主再入制导(航天运输系统自主制导与控制技术专栏)[J]. 航空学报, 0 : 1 -0 . DOI: 10.7527/S1000-6893.2025.32485

Abstract

To enhance the rapid response capability, mission adaptability, and robustness against significant model deviations during aerospace vehicle reentry, this study proposes an autonomous reentry guidance method based on planning-correction hierarchical reinforcement learning (HRL). Addressing the training instability issues in traditional HRL, a planning-correction hierarchical strategy is introduced to eliminate the dependence of upper-level policy training on lower-level state transition data, establishing a dual-layer guidance framework. In the planning layer, a modular RL policy is employed to plan reference angle-of-attack and bank angle profiles, generating global trajectories according to mission requirements to ensure the framework's adaptability. In the correction layer, high-frequency trajectory corrections under model parameter deviations are performed to mitigate the impact of large parameter deviations. Simulation results demonstrate that the dual-layer guidance strategy can handle larger parameter deviations and improve guidance accuracy under significant uncertainties. Compared to the predictor-corrector guidance algorithm, the proposed strategy exhibits superior mission adaptability and real-time performance, enabling autonomous guidance from arbitrary initial positions and orientations.
文章导航

/