航空学报 > 2023, Vol. 44 Issue (4): 126731-126731   doi: 10.7527/S1000-6893.2022.26731

基于深度强化学习的智能空战决策与仿真

周攀1, 黄江涛1(), 章胜1, 刘刚2, 舒博文1,3, 唐骥罡1   

  1. 1.中国空气动力研究与发展中心 空天技术研究所,绵阳  621000
    2.中国空气动力研究与发展中心,绵阳  621000
    3.西北工业大学 航空学院,西安  710072
  • 收稿日期:2021-12-02 修回日期:2022-01-12 接受日期:2022-01-17 出版日期:2022-01-28 发布日期:2022-01-26
  • 通讯作者: 黄江涛 E-mail:hjtcyf@163.com
  • 基金资助:
    省部级项目

Intelligent air combat decision making and simulation based on deep reinforcement learning

Pan ZHOU1, Jiangtao HUANG1(), Sheng ZHANG1, Gang LIU2, Bowen SHU1,3, Jigang TANG1   

  1. 1.Aerospace Technology Institute,China Aerodynamics Research and Development Center,Mianyang  621000,China
    2.China Aerodynamics Research and Development Center,Mianyang  621000,China
    3.School of Aeronautics,Northwestern Polytechnical University,Xi’an  710072,China
  • Received:2021-12-02 Revised:2022-01-12 Accepted:2022-01-17 Online:2022-01-28 Published:2022-01-26
  • Contact: Jiangtao HUANG E-mail:hjtcyf@163.com
  • Supported by:
    Provincial or Ministry Level Project

摘要:

飞行器空战智能决策是当今世界各军事强国的研究热点。为解决近距空战博弈中无人机的机动决策问题,提出一种基于深度强化学习方法的无人机近距空战格斗自主决策模型。决策模型中,采取并改进了一种综合考虑攻击角度优势、速度优势、高度优势和距离优势的奖励函数,改进后的奖励函数避免了智能体被敌机诱导坠地的问题,同时可以有效引导智能体向最优解收敛。针对强化学习中随机采样带来的收敛速度慢的问题,设计了基于价值的经验池样本优先度排序方法,在保证算法收敛的前提下,显著加快了算法收敛速度。基于人机对抗仿真平台对决策模型进行验证,结果表明智能决策模型能够在近距空战过程中压制专家系统和驾驶员。

关键词: 空战, 自主决策, 深度强化学习, TD3算法, 稀疏奖励

Abstract:

Intelligent decision-making for aircraft air combat is a research hotspot of military powers in the world today. To solve the problem of Unmanned Aerial Vehicle (UAV) maneuvering decision-making in the close-range air combat game, an autonomous decision-making model based on deep reinforcement learning is proposed, where a reward function comprehensively considering the attack angle advantage, speed advantage, altitude advantage and distance advantage is adopted and improved. The improved reward function avoids the problem that the agent is induced to fall to the ground by the enemy aircraft, and can effectively guide the agent to converge to the optimal solution. Aiming at the problem of slow convergence caused by random sampling in reinforcement learning, we design a value-based prioritization method for experience pool samples. Under the premise of ensuring the algorithm convergence, the convergence speed of the algorithm is significantly accelerated. The decision-making model is verified based on the human-machine confrontation simulation platform, and the results show that the model can suppress the expert system and the driver in the process of close air combat.

Key words: air combat, independent decision-making, deep reinforcement learning, TD3 algorithm, sparse rewards

中图分类号: