导航

ACTA AERONAUTICAET ASTRONAUTICA SINICA

    Next Articles

Research of intelligent guidance for no-fly zone avoidance based on reinforcement learning

Jun-Peng HUI1,   

  • Received:2022-05-11 Revised:2023-02-05 Online:2023-02-06 Published:2023-02-06
  • Contact: Jun-Peng HUI

Abstract: The rapid development of artificial intelligence (AI) provides a new technical approach for the research of aircraft guidance. Aiming at the problem of reentry aircraft for avoiding uncertain no-fly zone, we propose the research frame of “predictor-corrector guidance→pre-training of bank angle guidance model based on supervised learning→further training of bank angle guidance model based on reinforcement learning”. On the one hand, lots of flying trajectory for avoiding no-fly zone are produced by predictor-corrector guidance. The bank angle guidance model is pre-trained with supervised learning method. On the other hand, by taking the natural advantages of reinforcement learning in intelligent decision-making, the bank angle guidance model is further trained by the use of proximal policy optimization (PPO) algorithm. A large number of exploration interactions are taken between the aircraft and environment. At the same time, effective reward is set by referring to the hu-man’s idea of adjusting learning strategies based on feedback, to exploit the powerful lateral maneuverability of high lift-drag ratio reentry aircraft. Such method will get rid of restriction of bank angle solution space produced by predictor-corrector guidance, which is expected to produce better strategy for avoiding no-fly zone. By comparing with traditional predictor-corrector guidance and intelligent guidance based on supervised learning, it is verified that the no-fly zone intelligent guid-ance technology based on reinforcement learning can fully exploit the wide area flight advantages of aircraft, so as to meet the adaptability requirements of future aircraft intelligent decision system under uncertain scenarios.

Key words: intelligent guidance, no-fly zone avoidance, reinforcement learning, PPO algorithm