导航

Acta Aeronautica et Astronautica Sinica ›› 2023, Vol. 44 ›› Issue (22): 628871-628871.doi: 10.7527/S1000-6893.2023.28871

• special column • Previous Articles     Next Articles

Value-filter based air-combat maneuvering optimization

Yupeng FU1, Xiangyang DENG1,2(), Ziqiang ZHU1, Limin ZHANG1   

  1. 1.School of Aviation Support,Naval Aeronautical University,Yantai 264001,China
    2.Department of Automation,Tsinghua University,Beijing 100084,China
  • Received:2023-04-14 Revised:2023-05-30 Accepted:2023-06-14 Online:2023-11-25 Published:2023-06-27
  • Contact: Xiangyang DENG E-mail:skl18@mails.tsinghua.edu.cn
  • Supported by:
    National Natural Science Foundation of China(91538201);Foundation Program for National Defense High Level Talent(202220539)

Abstract:

To address the issues of low data utilization efficiency and convergence difficulty of traditional reinforcement learning algorithm in air-combat maneuvering decision optimization with large state space, the concept and principle of value filter are proposed and analyzed. A reinforcement learning algorithm named Demonstration Policy Constrain (DPC) is presented based on the value filter. A maneuvering decision optimization method based on the DPC algorithm is designed. With the value filter, the state-value based advantage data of the replay buffer and the demonstration buffer are extracted to constrain the optimization direction of the policy. Based on the JSBSim's aerodynamic model of F-16 Aircraft, the simulation results show that the convergence efficiency of the algorithm is significantly improved and the sub-optimal problem of the demonstration policy is mitigated, and the maneuvering decision method proposed achieves good intelligence.

Key words: value filter, policy constrain, maneuvering decision, reinforcement learning, imitation learning

CLC Number: