导航

ACTA AERONAUTICAET ASTRONAUTICA SINICA ›› 2023, Vol. 44 ›› Issue (11): 327596.doi: 10.7527/S1000-6893.2022.27596

• Electronics and Electrical Engineering and Control • Previous Articles     Next Articles

Trust region policy optimization guidance algorithm for intercepting maneuvering target

Wenxue CHEN, Changsheng GAO(), Wuxing JING   

  1. School of Astronautics,Harbin Institute of Technology,Harbin 150001,China
  • Received:2022-06-09 Revised:2022-06-21 Accepted:2022-07-21 Online:2023-06-15 Published:2022-07-25
  • Contact: Changsheng GAO E-mail:gaocs@hit.edu.cn
  • Supported by:
    National Natural Science Foundation of China(12072090)

Abstract:

Considering the characteristics of high speed and maneuverability of hypersonic vehicles in near-space, this paper proposes a deep reinforcement learning guidance algorithm based on the Trust Region Policy Optimization (TRPO) algorithm to improve the accuracy, robustness, and intelligence of the guidance algorithm for intercepting targets with different initial states and different maneuverability modes. The guidance algorithm based on the TRPO algorithm is composed of two policy (action) networks and a critic network, directly mapping the relative motion system state of the near-space target and the interceptor to the guidance command of the interceptor. In the algorithm training process, continuous action space and state space are reasonably designed, and the reward function is constructed to accelerate the training convergence speed by weighing energy consumption, relative distance, and other factors. Finally, tests are conducted for different task scenarios according to the trained agent model. The simulation results show that, compared with the traditional Proportional Navigation guidance law (PN) and the Improved Proportional Navigation guidance law (IPN), the guidance algorithm in this paper has smaller miss distances, a more stable interception effect, and robustness for learned scenarios and unknown scenarios, and can be widely used on multiple configuration computers.

Key words: deep reinforcement learning, trust region policy optimization, near-space interception, missile terminal guidance, maneuvering targets, Markov process

CLC Number: