航空学报 > 2024, Vol. 45 Issue (17): 530053-530053   doi: 10.7527/S1000-6893.2024.30053

基于混合动作的空战分层强化学习决策算法

李佐龙1, 朱纪洪1(), 匡敏驰1, 张杰2, 任洁2   

  1. 1.清华大学 精密仪器系,北京 100084
    2.航空工业成都飞机设计研究所,成都 610091
  • 收稿日期:2024-01-02 修回日期:2024-01-11 接受日期:2024-04-22 出版日期:2024-04-26 发布日期:2024-04-25
  • 通讯作者: 朱纪洪 E-mail:jhzhu@tsinghua.edu.cn

Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning

Zuolong LI1, Jihong ZHU1(), Minchi KUANG1, Jie ZHANG2, Jie REN2   

  1. 1.Department of Precision Instrument,Tsinghua University,Beijing 100084,China
    2.AVIC Chengdu Flight Design and Research Institute,Chengdu 610091,China
  • Received:2024-01-02 Revised:2024-01-11 Accepted:2024-04-22 Online:2024-04-26 Published:2024-04-25
  • Contact: Jihong ZHU E-mail:jhzhu@tsinghua.edu.cn

摘要:

智能空战是世界主要军事强国的研究热点。为解决超视距空战博弈机动决策问题,提出了基于深度强化学习的超视距空战分层决策算法。在该决策算法中,使用适合于超视距空战的机动动作集,对飞机的航迹和姿态进行控制。为了扩大模型的动作空间,提升模型的决策能力,将空战的动作空间进行分层,建模为多维离散的动作空间。针对空战中稀疏奖励的问题,设计了一套综合考虑位置优势、武器发射和武器威胁等要素的奖励函数,用于引导智能体向最优策略收敛。搭建了完整的数字孪生空战仿真环境和空战专家系统,在仿真环境中训练决策算法,并通过与专家系统的对抗,对决策算法进行评估。实验结果表明:决策算法具备超视距空战自主决策的能力,能够根据战场态势,进行灵活的机动决策,在与专家系统对抗的过程中具有一定的优势。

关键词: 超视距空战, 智能决策, 深度强化学习, 近端策略优化, 机动动作, 分层决策

Abstract:

Intelligent air combat is a hot research topic among countries with strong military power in the world. To solve the maneuver decision problem of air combat Beyond Visual Range (BVR), we propose the hierarchical decision algorithm based on deep reinforcement learning. In the decision algorithm, we use the maneuver set appropriate to the BVR air combat to control the trajectory and the attitude of the aircraft. To expand the action space of the model and increase its decision-making ability, we hierarchize the action space and model it as the multi-discrete one. To solve the problem of sparse reward in air combat, we design a set of reward function taking into consideration the factors including the position advantage, weapon launching, and weapon threat, which can guide the agent to converge to the optimal policy. We also build a complete digital-twin simulation environment for air combat and an expert system. The decision algorithm is trained in the simulation environment, and is evaluated by fighting with the expert system. The experiment results indicate that the decision algorithm proposed has the ability to make autonomous and flexible decisions in BVR air combat based on current situations, and has some advantages against the expert system.

Key words: air combat beyond visual range, intelligent decision, deep reinforcement learning, proximal policy optimization, maneuver, hierarchical decision

中图分类号: