航空学报 > 2024, Vol. 45 Issue (4): 328723-328723   doi: 10.7527/S1000-6893.2023.28723

基于自博弈深度强化学习的空战智能决策方法

单圣哲1,2, 张伟伟1()   

  1. 1.西北工业大学 航空学院,西安  710072
    2.中国人民解放军93995部队,西安  710306
  • 收稿日期:2023-03-21 修回日期:2023-06-12 接受日期:2023-08-29 出版日期:2024-02-25 发布日期:2023-09-01
  • 通讯作者: 张伟伟 E-mail:aeroelastic@nwpu.edu.cn
  • 基金资助:
    国防科技重点实验室基金(6142219190302)

Air combat intelligent decision-making method based on self-play and deep reinforcement learning

Shengzhe SHAN1,2, Weiwei ZHANG1()   

  1. 1.School of Aeronautics,Northwestern Polytechnical University,Xi’an  710072,China
    2.93995 Unit of the Chinese People’s Liberation Army,Xi’an  710306,China
  • Received:2023-03-21 Revised:2023-06-12 Accepted:2023-08-29 Online:2024-02-25 Published:2023-09-01
  • Contact: Weiwei ZHANG E-mail:aeroelastic@nwpu.edu.cn
  • Supported by:
    Science and Technology Foundation of National Defense Key Laboratory(6142219190302)

摘要:

空战是战争走向立体的重要环节,智能空战已经成为国内外军事领域的研究热点和重点,深度强化学习是实现空战智能化的重要技术途径。针对单智能体训练方法难以构建高水平空战对手问题,提出基于自博弈的空战智能体训练方法,搭建研究平台,根据飞行员领域知识合理设计观测、动作与奖励,通过“左右互搏”方式训练空战智能体至收敛,并通过仿真试验验证空战决策模型的有效性。研究结果表明通过自博弈训练,空战智能体战术水平逐步提升,最终对单智能体训练的决策模型构成70%以上胜率,并涌现类似人类“单/双环”战术的空战策略。

关键词: 空战, 人工智能, 深度强化学习, 自博弈, 智能体

Abstract:

Air combat is an important element in the three-dimensional nature of war, and intelligent air combat has become a hotspot and focus of research in the military field both domestically and internationally. Deep reinforcement learning is an important technological approach to achieving air combat intelligence. To address the challenge of constructing high-level opponents in single agent training method, a self-play based air combat agent training method is proposed, and a visualization research platform is built to develop a decision-making agent for close-range air combat. The field knowledge of pilots is embedded in the design process of the agent’s observation, action, and reward, training the agent to convergence. Simulation experiments show that the air combat tactics of agent gradually improves by self-play training, achieving a win rate of over 70% against the decision making by single agent training and the emerging of the strategies similar to human “single/double loop” tactics.

Key words: air combat, artificial intelligence, deep reinforcement learning, self-play, agent

中图分类号: