首页 >

基于强化学习的禁飞区绕飞智能制导技术研究

惠俊鹏,汪韧   

  1. 中国运载火箭技术研究院
  • 收稿日期:2022-05-11 修回日期:2023-02-05 出版日期:2023-02-06 发布日期:2023-02-06
  • 通讯作者: 惠俊鹏

Research of intelligent guidance for no-fly zone avoidance based on reinforcement learning

Jun-Peng HUI1,   

  • Received:2022-05-11 Revised:2023-02-05 Online:2023-02-06 Published:2023-02-06
  • Contact: Jun-Peng HUI

摘要: 人工智能(AI)的快速发展为飞行器制导技术的研究提供新的技术途径。本文针对高速飞行器面临不确定禁飞区的绕飞问题,提出“预测校正制导→基于监督学习预训练倾侧角制导模型→基于强化学习进一步升级倾侧角制导模型”逐级递进的禁飞区绕飞智能制导研究框架:一是基于传统预测校正制导生成大量禁飞区绕飞样本轨迹,并基于监督学习方法对倾侧角制导模型进行预训练;二是利用强化学习技术在智能决策方面的天然优势,进一步采用强化学习中近端策略优化(Proximal Policy Optimization,PPO)算法升级倾侧角制导模型,通过飞行器与环境大量交互探索,并借鉴人类基于反馈来调整学习策略的思想,设置有效的奖励(反馈)引导,充分挖掘高升阻比飞行器强大的横向机动能力,摆脱传统预测校正制导方法对倾侧角解空间的约束,期望产生更优的绕飞策略。通过与传统预测校正制导和基于监督学习的智能制导的对比分析,验证了基于强化学习的禁飞区绕飞智能制导技术能够充分发挥飞行器的宽域飞行优势,满足未来飞行器智能决策系统对不确定绕飞场景的适应性需求。

关键词: 智能制导, 禁飞区绕飞, 强化学习, PPO算法

Abstract: The rapid development of artificial intelligence (AI) provides a new technical approach for the research of aircraft guidance. Aiming at the problem of reentry aircraft for avoiding uncertain no-fly zone, we propose the research frame of “predictor-corrector guidance→pre-training of bank angle guidance model based on supervised learning→further training of bank angle guidance model based on reinforcement learning”. On the one hand, lots of flying trajectory for avoiding no-fly zone are produced by predictor-corrector guidance. The bank angle guidance model is pre-trained with supervised learning method. On the other hand, by taking the natural advantages of reinforcement learning in intelligent decision-making, the bank angle guidance model is further trained by the use of proximal policy optimization (PPO) algorithm. A large number of exploration interactions are taken between the aircraft and environment. At the same time, effective reward is set by referring to the hu-man’s idea of adjusting learning strategies based on feedback, to exploit the powerful lateral maneuverability of high lift-drag ratio reentry aircraft. Such method will get rid of restriction of bank angle solution space produced by predictor-corrector guidance, which is expected to produce better strategy for avoiding no-fly zone. By comparing with traditional predictor-corrector guidance and intelligent guidance based on supervised learning, it is verified that the no-fly zone intelligent guid-ance technology based on reinforcement learning can fully exploit the wide area flight advantages of aircraft, so as to meet the adaptability requirements of future aircraft intelligent decision system under uncertain scenarios.

Key words: intelligent guidance, no-fly zone avoidance, reinforcement learning, PPO algorithm