首页 >

基于深度强化学习的智能空战决策与仿真研究

周攀1,黄江涛1,章胜2,刘刚1,舒博文3,唐骥罡1   

  1. 1. 中国空气动力研究与发展中心
    2. 中国空气动力研究与发展中心计算空气动力研究所
    3. 西北工业大学
  • 收稿日期:2021-12-02 修回日期:2022-01-23 出版日期:2022-01-26 发布日期:2022-01-26
  • 通讯作者: 黄江涛

Research on UAV Intelligent Air Combat Decision and Simulation Based on Deep Reinforcement Learning

  • Received:2021-12-02 Revised:2022-01-23 Online:2022-01-26 Published:2022-01-26

摘要: 飞行器空战智能决策是当今世界各军事强国的研究热点。为解决近距空战博弈中无人机的机动决策问题,提出一种基于深度强化学习方法的无人机近距空战格斗自主决策模型。决策模型中,采取并改进了一种综合考虑攻击角度优势、速度优势、高度优势和距离优势的奖励函数,改进后的奖励函数避免了智能体被敌机诱导坠地的问题,同时可以有效引导智能体向最优解收敛。针对强化学习中随机采样带来的收敛速度慢的问题,设计了基于价值的经验池样本优先度排序方法,在保证算法收敛的前提下,显著加快了算法收敛速度。基于人机对抗仿真平台对决策模型进行验证,结果表明智能决策模型能够在近距空战过程中压制专家系统和驾驶员。

关键词: 空战, 自主决策, 深度强化学习, TD3算法, 稀疏奖励

Abstract: Intelligent decision-making for aircraft air combat is a research hotspot of military powers in the world today. In order to solve the problem of unmanned aerial vehicle maneuvering decision-making in the close-range air combat game, an au-tonomous decision-making model based on the deep reinforcement learning method for close-range air combat of UAVs is proposed. In the decision-making model, a reward function that comprehensively considers attack angle advantage, speed advantage, altitude advantage and distance advantage is adopted and improved. The improved reward function avoids the problem that the agent is induced to fall to the ground by the enemy aircraft, and can effectively guide the agent to converge to the optimal solution. Aiming at the problem of slow convergence caused by random sampling in reinforcement learning, a value-based prioritization method of experience pool samples is designed. Under the premise of ensuring the convergence of the algorithm, the convergence speed of the algorithm is significantly accelerated. The deci-sion-making model is verified based on the human-machine confrontation simulation platform, and the results show that the intelligent decision-making model can suppress the expert system and the driver in the process of close air combat.

Key words: Air Combat, Independent Decision, Deep Reinforcement Learning, TD3 Algorithm, Sparse Rewards

中图分类号: