航空学报 > 2017, Vol. 38 Issue (10): 321168-321168   doi: 10.7527/S1000-6893.2017.321168

基于启发式强化学习的空战机动智能决策

左家亮1, 杨任农1, 张滢1, 李中林2, 邬蒙1   

  1. 1. 空军工程大学 航空航天工程学院, 西安 710038;
    2. 空军驻沪宁地区军代表室, 南京 210007
  • 收稿日期:2017-02-06 修回日期:2017-04-28 出版日期:2017-10-15 发布日期:2017-04-28
  • 通讯作者: 左家亮 E-mail:jialnzuo@163.com

Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning

ZUO Jialiang1, YANG Rennong1, ZHANG Ying1, LI Zhonglin2, WU Meng1   

  1. 1. College of Aeronautics and Astronautic Engineering, Air Force Engineering University, Xi'an 710038, China;
    2. Air Force Representative Office in Shanghai and Nanjing Area, Nanjing 210007, China
  • Received:2017-02-06 Revised:2017-04-28 Online:2017-10-15 Published:2017-04-28

摘要:

空战机动智能决策一直是研究热点,现有的空战机动决策主要采用优化理论和传统的人工智能算法,是在相对固定的环境下进行决策序列计算研究。但实际空战是动态变化的,且有很多不确定性因素。采用传统的理论方法进行求解,很难获取与实际情况相符的决策序列。提出了基于启发式强化学习的空战机动智能决策方法,在与外界环境动态交互的过程中,采用"试错"的方式计算相对较优的空战机动决策序列,并采用神经网络方法对强化学习的过程进行学习,积累知识,启发后续的搜索过程,很大程度上提高了搜索效率,实现空战决策过程中决策序列的实时动态迭代计算。最后仿真实验结果表明本文提出的算法所计算的决策序列与实际情况相符。

关键词: 空战机动, 智能决策, 启发式强化学习, 神经网络, 决策序列

Abstract:

Intelligent decision-making air combat maneuvering has been a research hotspot all the time.Current research on the air combat mainly uses optimization theory and algorithm of traditional artificial intelligence to compute the air combat decision sequence in the relative fixed environment.However,the process of the air combat is dynamic and thus contains many uncertain elements.It is thus difficult to obtain the decision sequence that is tally with the actual conditions of the air combat by using the traditional theoretical methods.A new method for intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning is proposed in this paper.The "trial and error learning" method is adopted to compute the relative better air combat decision sequence in the dynamic air combat,and the neural network is used to learn the process of the reinforcement learning at the same time to accumulate knowledge and inspire the search process of the reinforcement learning.The search efficiency is increased to a great extent,and real-time dynamic computation of the decision sequence during the air combat is realized.Experiment results indicate that the decision sequence conforms to actual conditions.

Key words: air combat maneuvering, intelligence decision-making, heuristic reinforcement learning, neural network, decision sequence

中图分类号: