航空学报 > 2023, Vol. 44 Issue (11): 327416-327416   doi: 10.7527/S1000-6893.2022.27416

基于强化学习的禁飞区绕飞智能制导技术

惠俊鹏1(), 汪韧2, 郭继峰1   

  1. 1.哈尔滨工业大学 航天学院,哈尔滨 150006
    2.中国航天科技创新研究院,北京 100176
  • 收稿日期:2022-05-11 修回日期:2022-12-08 接受日期:2023-01-17 出版日期:2023-06-15 发布日期:2023-02-06
  • 通讯作者: 惠俊鹏 E-mail:hjpbuaa@126.com
  • 基金资助:
    国家级项目

Intelligent guidance for no⁃fly zone avoidance based on reinforcement learning

Junpeng HUI1(), Ren WANG2, Jifeng GUO1   

  1. 1.School of Astronautics,Harbin Institute of Technology,Harbin 150006,China
    2.China Academy of Aerospace Science and Innovation,Beijing 100176,China
  • Received:2022-05-11 Revised:2022-12-08 Accepted:2023-01-17 Online:2023-06-15 Published:2023-02-06
  • Contact: Junpeng HUI E-mail:hjpbuaa@126.com
  • Supported by:
    National Level project

摘要:

人工智能(AI)的快速发展为飞行器制导技术的研究提供新的技术途径。本文针对高速飞行器面临不确定禁飞区的绕飞问题,提出“预测校正制导—基于监督学习预训练倾侧角制导模型—基于强化学习进一步升级倾侧角制导模型”逐级递进的禁飞区绕飞智能制导研究框架:一是基于传统预测校正制导生成大量禁飞区绕飞样本轨迹,并基于监督学习方法对倾侧角制导模型进行预训练;二是进一步采用强化学习中近端策略优化算法(PPO)升级倾侧角制导模型,通过飞行器与带有不确定禁飞区环境的大量交互探索,并设置有效的奖励引导,充分挖掘高升阻比飞行器强大的横向机动能力,摆脱传统预测校正制导方法对倾侧角解空间的约束,期望产生更优的绕飞策略。通过与传统预测校正制导和基于监督学习的智能制导的对比分析,验证了基于强化学习的禁飞区绕飞智能制导技术能够充分发挥飞行器的宽域飞行优势,满足未来飞行器智能决策系统对不确定绕飞场景的适应性需求。

关键词: 智能制导, 禁飞区绕飞, 强化学习, PPO算法, 监督学习

Abstract:

The rapid development of Artificial Intelligence (AI) provides a new technical approach for the research of aircraft guidance. Aiming at the problem of reentry aircraft for avoiding uncertain no-fly zone, we propose the research frame of “predictor-corrector guidance-pre-training of bank angle guidance model based on supervised learning-further training of bank angle guidance model based on reinforcement learning”. On the one hand, lots of flying trajectory for avoiding no-fly zone are produced by predictor-corrector guidance. The bank angle guidance model is pre-trained with supervised learning algorithm. On the other hand, the bank angle guidance model is further trained by the use of Proximal Policy Optimization (PPO) algorithm. A large number of exploration interactions are taken between aircraft and environment with uncertain no-fly-zone. At the same time, the powerful lateral maneuverability of high lift-drag ratio reentry aircraft is exploited with effective reward. Such method will get rid of restriction of bank angle solution space produced by predictor-corrector guidance, which is expected to produce better strategy for avoiding no-fly zone. By comparing with traditional predictor-corrector guidance and intelligent guidance based on supervised learning, it is verified that the no-fly zone intelligent guidance technology based on reinforcement learning can fully exploit the wide area flight advantages of aircraft, so as to meet the adaptability requirements of future intelligent decision system under uncertain scenarios.

Key words: intelligent guidance, no-fly zone avoidance, reinforcement learning, PPO algorithm, supervised learning

中图分类号: