航空学报 > 2022, Vol. 43 Issue (9): 325960-325960   doi: 10.7527/S1000-6893.2021.25960

基于强化学习的再入飞行器“新质”走廊在线生成技术

惠俊鹏, 汪韧, 俞启东   

  1. 中国运载火箭技术研究院 研究发展部, 北京 100076
  • 收稿日期:2021-06-11 修回日期:2021-07-05 出版日期:2022-09-15 发布日期:2021-08-03
  • 通讯作者: 惠俊鹏,E-mail:hjpbuaa@126.com E-mail:hjpbuaa@126.com

Generating new quality flight corridor for reentry aircraft based on reinforcement learning

HUI Junpeng, WANG Ren, YU Qidong   

  1. Research and Development Center, China Academy of Launch Vehicle Technology, Beijing 100076, China
  • Received:2021-06-11 Revised:2021-07-05 Online:2022-09-15 Published:2021-08-03

摘要: 人工智能技术的突破性进展为飞行器再入制导技术的研究提供了新的技术途径。无论是基于标称轨迹的制导还是预测校正制导, 都需要基于人工经验设计飞行走廊参数。本文旨在突破传统制导方法中普遍存在的"锥形"飞行轨迹包络的约束, 利用强化学习技术在智能决策方面的天然优势, 在满足动力学方程和热流率、过载、动压等硬条件的前提下, 通过飞行器与环境的大量交互"试错": 一方面, 借鉴人类基于反馈来调整学习策略的思想, 设置有效的奖励(反馈)引导; 另一方面, 利用强化学习中近端策略优化(PPO)算法训练飞行器倾侧角制导模型, 基于实时的状态信息在线决策倾侧角指令, 探索出完全不同于传统制导方法的"新质"飞行走廊。Monte Carlo仿真分析验证了基于强化学习的智能制导技术能够充分发挥飞行器的宽域飞行优势, 进一步拓展飞行剖面。

关键词: 智能制导, "新质"飞行走廊, 强化学习, PPO算法, 人工智能

Abstract: The breakthrough of artificial intelligence provides a new technical approach for the research on aircraft reentry guidance. In both of the reference trajectory tracking guidance and predictor-corrector guidance, flight corridor parameters need to be designed based on manual experience in advance. In this paper, we propose to break through the constraint of "conical" flight path envelope, which is common in traditional guidance methods, by taking the natural advantage of reinforcement learning in intelligent decision making. Under the premise of satisfying dynamic equations and hard conditions such as heating rate, load factor and dynamic pressure, a large number of "trial-and-error" interactions can be taken between the aircraft and environment. Effective reward is set by referring to the human's idea of adjusting learning strategies based on feedback. Proximal Policy Optimization (PPO) algorithm in reinforcement learning is employed to train the bank angle guidance model, so as to generate bank angle instruction online based on real-time state information. The "new quality" flight corridor is explored, which is completely different from the traditional guidance method. Monte Carlo simulation analysis verifies that the intelligent guidance technology based on reinforcement learning can fully exploit the advantage of wide range flight of aircraft and further expand the flight profile of aircraft.

Key words: intelligent guidance, new quality flight corridor, reinforcement learning, PPO algorithm, artifical intelligence

中图分类号: