电子电气工程与控制

基于启发式强化学习的空战机动智能决策

  • 左家亮 ,
  • 杨任农 ,
  • 张滢 ,
  • 李中林 ,
  • 邬蒙
展开
  • 1. 空军工程大学 航空航天工程学院, 西安 710038;
    2. 空军驻沪宁地区军代表室, 南京 210007

收稿日期: 2017-02-06

  修回日期: 2017-04-28

  网络出版日期: 2017-04-28

Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning

  • ZUO Jialiang ,
  • YANG Rennong ,
  • ZHANG Ying ,
  • LI Zhonglin ,
  • WU Meng
Expand
  • 1. College of Aeronautics and Astronautic Engineering, Air Force Engineering University, Xi'an 710038, China;
    2. Air Force Representative Office in Shanghai and Nanjing Area, Nanjing 210007, China

Received date: 2017-02-06

  Revised date: 2017-04-28

  Online published: 2017-04-28

摘要

空战机动智能决策一直是研究热点,现有的空战机动决策主要采用优化理论和传统的人工智能算法,是在相对固定的环境下进行决策序列计算研究。但实际空战是动态变化的,且有很多不确定性因素。采用传统的理论方法进行求解,很难获取与实际情况相符的决策序列。提出了基于启发式强化学习的空战机动智能决策方法,在与外界环境动态交互的过程中,采用"试错"的方式计算相对较优的空战机动决策序列,并采用神经网络方法对强化学习的过程进行学习,积累知识,启发后续的搜索过程,很大程度上提高了搜索效率,实现空战决策过程中决策序列的实时动态迭代计算。最后仿真实验结果表明本文提出的算法所计算的决策序列与实际情况相符。

本文引用格式

左家亮 , 杨任农 , 张滢 , 李中林 , 邬蒙 . 基于启发式强化学习的空战机动智能决策[J]. 航空学报, 2017 , 38(10) : 321168 -321168 . DOI: 10.7527/S1000-6893.2017.321168

Abstract

Intelligent decision-making air combat maneuvering has been a research hotspot all the time.Current research on the air combat mainly uses optimization theory and algorithm of traditional artificial intelligence to compute the air combat decision sequence in the relative fixed environment.However,the process of the air combat is dynamic and thus contains many uncertain elements.It is thus difficult to obtain the decision sequence that is tally with the actual conditions of the air combat by using the traditional theoretical methods.A new method for intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning is proposed in this paper.The "trial and error learning" method is adopted to compute the relative better air combat decision sequence in the dynamic air combat,and the neural network is used to learn the process of the reinforcement learning at the same time to accumulate knowledge and inspire the search process of the reinforcement learning.The search efficiency is increased to a great extent,and real-time dynamic computation of the decision sequence during the air combat is realized.Experiment results indicate that the decision sequence conforms to actual conditions.

参考文献

[1] NICHOLAS E, DAVID C, COREY S, et al. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J]. Journal of Defense Management, 2016, 6(1):1-7.
[2] YIN Y, GONG G, HAN L. An approach to pilot air-combat behavior assessment[J]. Procedia Engineering, 2011, 15:4036-4040.
[3] 傅莉, 谢福怀. 基于滚动时域的无人机空战决策专家系统[J]. 北京航空航天大学学报, 2015, 41(11):1994-1999. FU L, XIE F H. Real-time path planning to track moving target in complex environment for UAV[J]. Journal of Beijing University of Aeronautics and Astronautics, 2015, 41(11):1994-1999(in Chinese).
[4] 傅莉, 王晓光. 无人战机近距空战微分对策建模研究[J]. 兵工学报, 2012, 33(10):1210-1216. FU L, WANG X G. Research on close air combat modeling of differential games for unmanned combat air vehicles[J]. Acta Armamentarii, 2012, 33(10):1210-1216(in Chinese).
[5] SU M C, LAI S C. A new approach to multi-aircraft air combat assignments[J]. Swarm and Evolutionary Computation, 2012(6):39-46.
[6] 张涛, 于雷, 周中良, 等. 基于混合算法的空战机动决策[J]. 系统工程与电子技术, 2013, 35(7):1445-1450. ZHANG T, YU L, ZHOU Z L, et al. Decision-making for air combat maneuvering based on hybrid algorithm[J]. Systems Engineering and Electronics, 2013, 35(7):1445-1450(in Chinese).
[7] 左家亮, 杨任农. 基于模糊聚类的近距空战决策过程与评估[J]. 航空学报, 2015, 36(5):1650-1660. ZUO J L, YANG R N. Reconstruction and evaluation of close air combat decision-making process based on fuzzy clustering[J]. Acta Aeronautica et Astronautica Sinica, 2015, 36(5):1650-1660(in Chinese).
[8] RUAN C W, ZHOU Z L. Task assignment under constraint of timing sequential for cooperative air combat[J]. Journal of Systems Engineering and Electronics, 2016, 27(4):836-844.
[9] 康冰, 王曦辉, 刘富. 基于改进蚁群算法的搜索机器人路径规划[J]. 吉林大学学报(工学版), 2014, 44(4):1062-1068. KANG B, WANG X H, LIU F. Path planning of searching robot based on improved ant colony algorithm[J]. Journal of Jilin University (Engineering and Technology Edition), 2014, 44(4):1062-1068(in Chinese).
[10] 梁宵, 王宏伦, 曹梦磊, 等. 无人机复杂环境中跟踪运动目标的实时航路规划[J]. 北京航空航天大学学报, 2012, 38(9):1129-1133. LIANG X, WANG H L, CAO M L, et al. Real-time path planning to track moving target in complex environment for UAV[J]. Journal of Beijing University of Aeronautics and Astronautics, 2012, 38(9):1129-1133(in Chinese).
[11] SUTTON R S, BARTO A G. Introduction to reinforcement learning[M]. Cambridge:MIT Press, 1988.
[12] LIU C, XU X, HU D. Multi-objective reinforcement learning:A comprehensive overview[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part C:Application and Reviews, 2013, 99(4):1-13.
[13] 陈兴国, 俞扬. 强化学习及其在电脑围棋中的应用[J]. 自动化学报, 2016, 42(5):685-695. CHEN X G, YU Y. Reinforcement learning and its application to game of go[J]. Acta Automatica Sinica, 2016, 42(5):685-695(in Chinese).
[14] 薛羽, 庄毅. 基于启发式自适应离散差分进化算法的多UCAV协同干扰空战决策[J]. 航空学报, 2013, 34(2):343-351. XUE Y, ZHANG Y. Multiple UCAV cooperative jamming air combat decision making based on heuristic self-adaptive discrete differential algorithm[J]. Acta Aeronautica et Astronautica Sinca, 2013, 34(2):343-351(in Chinese).
[15] BIANCHI R A C, RIBEIRO C H C, COSTA A H R. Accelerating autonomous learning by using heuristic selection of actions[J]. Journal of Heuristics, 2008, 14(2):135-168.
[16] DIETTERICH T G. Hierarchical reinforcement learning with the MAXQ value function decomposition[J]. Journal of Artificial Intelligence Research, 2000(13):227-303.
[17] AUSTIN F, CARBONE G, FALCO M. Automated maneuvering during air-to-air combat:RE-742[R]. Bethpage, NY:Grumman Corporate Research Center,1990.

文章导航

/