航空学报 > 2023, Vol. 44 Issue (S2): 729400-729400   doi: 10.7527/S1000-6893.2023.29400

基于强化学习的高超飞行器协同博弈制导方法

倪炜霖1, 王永海2, 徐聪2, 赤丰华2, 梁海朝1()   

  1. 1.中山大学 航空航天学院,深圳  518107
    2.空间物理重点实验室,北京  100076
  • 收稿日期:2023-08-02 修回日期:2023-08-03 接受日期:2023-09-04 出版日期:2023-09-15 发布日期:2023-09-13
  • 通讯作者: 梁海朝 E-mail:lianghch5@mail.sysu.edu.cn
  • 基金资助:
    国家自然科学基金(62003375)

Cooperative game guidance method for hypersonic vehicles based on reinforcement learning

Weilin NI1, Yonghai WANG2, Cong XU2, Fenghua CHI2, Haizhao LIANG1()   

  1. 1.School of Aeronautics and Astronautics,Sun Yat-sen University,Shenzhen  518107,China
    2.Science and Technology on Space Physics Laboratory,Beijing  100076,China
  • Received:2023-08-02 Revised:2023-08-03 Accepted:2023-09-04 Online:2023-09-15 Published:2023-09-13
  • Contact: Haizhao LIANG E-mail:lianghch5@mail.sysu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62003375)

摘要:

研究了多拦截场景下高超声速飞行器主动防御攻防对抗的智能协同博弈制导方法。针对高超声速飞行器与主动防御飞行器协同对抗多个拦截器攻击的博弈问题,提出了一种基于双延迟深度确定性策略梯度算法的高超声速飞行器智能协同博弈制导方法,该方法能够在高超声速飞行器与主动防御飞行器机动能力和响应速度不足的情况下实现对于多拦截器的高成功率博弈。通过构建一类启发式连续奖励函数,设计了一种自适应渐进式课程学习方法,提出了一种快速稳定收敛训练方法,解决深度强化学习训练过程中的稀疏奖励问题,实现智能博弈算法的稳定快速收敛。最后通过数值仿真对所提出方法的有效性进行验证,仿真结果表明,所提出的理论方法能够提高训练收敛效率与稳定性,且相比于传统博弈制导方法具有更高的博弈成功率。

关键词: 博弈理论, 奖励函数塑造, 课程学习, 深度强化学习, 高超声速飞行器

Abstract:

The intelligent cooperative game guidance method for hypersonic vehicle active defense attack and defense confrontation in multiple interception scenarios is studied. Aiming at the game problem in which a hypersonic vehicle and an active defense vehicle cooperate against multiple interceptor attacks, we propose an intelligent cooperative game guidance method for a hypersonic vehicle based on a double-delay deep deterministic policy gradient algorithm. It can achieve a high success rate game for multi-interceptors in the case of insufficient maneuverability and response speed of hypersonic aircraft and active defense aircraft. By constructing a class of heuristic continuous reward functions and designing an adaptive progressive curriculum learning method, we propose a fast and stable convergence training method to solve the sparse reward problem in the training process of deep reinforcement learning, and realize the stable and fast convergence of intelligent game algorithms. Finally, the effectiveness of the proposed method is verified by numerical simulation. The simulation results show that the proposed theoretical method can improve the training convergence efficiency and stability, and has a higher game success rate than the traditional game guidance method.

Key words: game theory, reward shaping, curriculum learning, reinforcement learning, hypersonic vehicles

中图分类号: