首页 >

基于规划-校正分层强化学习的自主再入制导(航天运输系统自主制导与控制技术专栏)

彭高祥,王博,刘磊,樊慧津   

  1. 华中科技大学
  • 收稿日期:2025-06-27 修回日期:2025-11-09 出版日期:2025-11-10 发布日期:2025-11-10
  • 通讯作者: 王博
  • 基金资助:
    武汉市知识创新专项基础研究项目;国家自然科学基金

Autonomous reentry guidance based on planning-correction hierarchical reinforcement learning

Gaoxiang Peng1,Bo Wang2,Lei Liu2, 2   

  1. 1. Huazhong University of Science and Technology
    2.
  • Received:2025-06-27 Revised:2025-11-09 Online:2025-11-10 Published:2025-11-10
  • Contact: Bo Wang

摘要: 为增强空天飞行器再入过程的快速响应能力、任务适应性和对显著模型偏差的鲁棒性,提出了基于规划-校正分层强化学习的自主再入制导方法。针对传统分层强化学习的训练不平稳性问题,为消除上层策略训练对下层状态转移数据的依赖,提出规划-校正分层策略,建立双层制导框架。在规划层,采用模块化强化学习策略规划参考攻角剖面和倾侧角剖面,根据任务需求实现全局轨迹生成,确保制导框架的任务适应能力;在校正层,通过模型参数偏差下的高频轨迹校正,克服参数大偏差影响。仿真结果表明,双层制导策略能够克服更大的参数偏差,提升在大偏差情况下的制导精度。同时,与预测校正制导算法比较,双层制导策略展现了更强的任务适应性和实时性,可实现任意位置与方向的任务下自主制导。

关键词: 空天飞行器, 再入段, 自主制导, 分层强化学习, 规划-校正分层

Abstract: To enhance the rapid response capability, mission adaptability, and robustness against significant model deviations during aerospace vehicle reentry, this study proposes an autonomous reentry guidance method based on planning-correction hierarchical reinforcement learning (HRL). Addressing the training instability issues in traditional HRL, a planning-correction hierarchical strategy is introduced to eliminate the dependence of upper-level policy training on lower-level state transition data, establishing a dual-layer guidance framework. In the planning layer, a modular RL policy is employed to plan reference angle-of-attack and bank angle profiles, generating global trajectories according to mission requirements to ensure the framework's adaptability. In the correction layer, high-frequency trajectory corrections under model parameter deviations are performed to mitigate the impact of large parameter deviations. Simulation results demonstrate that the dual-layer guidance strategy can handle larger parameter deviations and improve guidance accuracy under significant uncertainties. Compared to the predictor-corrector guidance algorithm, the proposed strategy exhibits superior mission adaptability and real-time performance, enabling autonomous guidance from arbitrary initial positions and orientations.

Key words: Aerospace vehicle, Reentry phase, Autonomous guidance, Hierarchical reinforcement learning, Planning-correction hierarchy

中图分类号: