基于深度强化学习的智能空战决策与仿真

doi:10.7527/S1000-6893.2022.26731

流体力学与飞行力学

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于深度强化学习的智能空战决策与仿真

周攀¹, 黄江涛¹(), 章胜¹, 刘刚², 舒博文¹^,³, 唐骥罡¹

^1.中国空气动力研究与发展中心空天技术研究所，绵阳　621000
^2.中国空气动力研究与发展中心，绵阳　621000
^3.西北工业大学航空学院，西安　710072

收稿日期:2021-12-02 修回日期:2022-01-12 接受日期:2022-01-17 出版日期:2022-01-28 发布日期:2022-01-26
通讯作者: 黄江涛 E-mail:hjtcyf@163.com
基金资助:
省部级项目

Intelligent air combat decision making and simulation based on deep reinforcement learning

Pan ZHOU¹, Jiangtao HUANG¹(), Sheng ZHANG¹, Gang LIU², Bowen SHU¹^,³, Jigang TANG¹

^1.Aerospace Technology Institute，China Aerodynamics Research and Development Center，Mianyang 　621000，China
^2.China Aerodynamics Research and Development Center，Mianyang 　621000，China
^3.School of Aeronautics，Northwestern Polytechnical University，Xi’an 　710072，China

Received:2021-12-02 Revised:2022-01-12 Accepted:2022-01-17 Online:2022-01-28 Published:2022-01-26
Contact: Jiangtao HUANG E-mail:hjtcyf@163.com
Supported by:
Provincial or Ministry Level Project

摘要/Abstract

摘要：

飞行器空战智能决策是当今世界各军事强国的研究热点。为解决近距空战博弈中无人机的机动决策问题，提出一种基于深度强化学习方法的无人机近距空战格斗自主决策模型。决策模型中，采取并改进了一种综合考虑攻击角度优势、速度优势、高度优势和距离优势的奖励函数，改进后的奖励函数避免了智能体被敌机诱导坠地的问题，同时可以有效引导智能体向最优解收敛。针对强化学习中随机采样带来的收敛速度慢的问题，设计了基于价值的经验池样本优先度排序方法，在保证算法收敛的前提下，显著加快了算法收敛速度。基于人机对抗仿真平台对决策模型进行验证，结果表明智能决策模型能够在近距空战过程中压制专家系统和驾驶员。

关键词: 空战, 自主决策, 深度强化学习, TD3算法, 稀疏奖励

Abstract:

Intelligent decision-making for aircraft air combat is a research hotspot of military powers in the world today. To solve the problem of Unmanned Aerial Vehicle （UAV） maneuvering decision-making in the close-range air combat game， an autonomous decision-making model based on deep reinforcement learning is proposed， where a reward function comprehensively considering the attack angle advantage， speed advantage， altitude advantage and distance advantage is adopted and improved. The improved reward function avoids the problem that the agent is induced to fall to the ground by the enemy aircraft， and can effectively guide the agent to converge to the optimal solution. Aiming at the problem of slow convergence caused by random sampling in reinforcement learning， we design a value-based prioritization method for experience pool samples. Under the premise of ensuring the algorithm convergence， the convergence speed of the algorithm is significantly accelerated. The decision-making model is verified based on the human-machine confrontation simulation platform， and the results show that the model can suppress the expert system and the driver in the process of close air combat.

Key words: air combat, independent decision-making, deep reinforcement learning, TD3 algorithm, sparse rewards

中图分类号:

V249.12

周攀, 黄江涛, 章胜, 刘刚, 舒博文, 唐骥罡. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731-126731.

Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.

图/表 22

图 1

图 2

图 3

图 4

图 5

图 6

图 7

图 8

图 9

图 10

图 11

图 12

图 13

图 14

图 15

图 16

图 17

图 18

图 19

图 20

图 21

图 22

参考文献 27

1	SILVER D， HUANG A， MADDISON C J， et al. Mastering the game of Go with deep neural networks and tree search［J］. Nature， 2016， 529（7587）： 484-489.
2	Defense Advanced Research Projects Agency. AlphaGogfight trials go virtual for final event ［EB/OL］. （2020-08-07）［2021-03-10］. ：.
3	孙智孝，杨晟琦，朴海音，等. 未来智能空战发展综述［J］. 航空学报， 2021， 42（8）： 525799.
	SUN Z X， YANG S Q， PIAO H Y， et al. A survey of air combat artificial intelligence［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525799 （in Chinese）.
4	PARK H， LEE B Y， TAHK M J， et al. Differential game based air combat maneuver generation using scoring function matrix［J］. International Journal of Aeronautical and Space Sciences， 2016， 17（2）： 204-213.
5	WEINTRAUB I E， PACHTER M， GARCIA E. An introduction to pursuit-evasion differential games［C］∥ 2020 American Control Conference （ACC）. Piscataway： IEEE Press， 2020： 1049-1066.
6	MCGREW J S. Real-time maneuvering decisions for autonomous air combat［D］. Cambridge： Massachusetts Institute of Technology， 2008： 91-104.
7	KANESHIGE J， KRISHNAKUMAR K. Artificial immune system approach for air combat maneuvering［C］∥Proceeding of the SPIE， 2007.
8	薛羽，庄毅，张友益，等. 基于启发式自适应离散差分进化算法的多UCAV协同干扰空战决策［J］. 航空学报， 2013， 34（2）： 343-351.
	XUE Y， ZHUANG Y， ZHANG Y Y， et al. Multiple UCAV cooperative jamming air combat decision making based on heuristic self-adaptive discrete differential evolution algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2013， 34（2）： 343-351 （in Chinese）.
9	BURGIN G H. Improvements to the adaptive maneuvering logic program： NASA CR 3985［R］. Washington， D.C.： NASA， 1986.
10	左家亮，杨任农，张滢，等. 基于启发式强化学习的空战机动智能决策［J］. 航空学报， 2017， 38（10）： 321168.
	ZUO J L， YANG R N， ZHANG Y， et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2017， 38（10）： 321168 （in Chinese）.
11	张耀中，许佳林，姚康佳，等. 基于DDPG算法的无人机集群追击任务［J］. 航空学报， 2020， 41（10）： 324000.
	ZHANG Y Z， XU J L， YAO K J， et al. Pursuit missions for UAV swarms based on DDPG algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（10）： 324000 （in Chinese）.
12	杜海文，崔明朗，韩统，等．基于多目标优化与强化学习的空战机动决策［J］.北京航空航天大学学报，2018， 44 （11）： 2247-2256.
	DU H W， CUI M L， HAN T， et al. Maneuvering decision in air combat based on multi-objective optimization and reinforcement learning ［J］. Journal of Beijing University of Aeronautics and Astronautics， 2018， 44（11）： 2247-2256 （in Chinese）.
13	施伟，冯旸赫，程光权，等. 基于深度强化学习的多机协同空战方法研究［J］. 自动化学报， 2021， 47（7）： 1610-1623.
	SHI W， FENG Y H， CHENG G Q， et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning［J］. Acta Automatica Sinica， 2021， 47（7）： 1610-1623 （in Chinese）.
14	张强，杨任农，俞利新，等. 基于Q-network强化学习的超视距空战机动决策［J］. 空军工程大学学报（自然科学版）， 2018， 19（6）： 8-14.
	ZHANG Q， YANG R N， YU L X， et al. BVR air combat maneuvering decision by using Q-network reinforcement learning［J］. Journal of Air Force Engineering University （Natural Science Edition）， 2018， 19（6）： 8-14 （in Chinese）.
15	李银通，韩统，孙楚，等. 基于逆强化学习的空战态势评估函数优化方法［J］. 火力与指挥控制， 2019， 44（8）： 101-106.
	LI Y T， HAN T， SUN C， et al. An optimization method of air combat situation assessment function based on inverse reinforcement learning［J］. Fire Control ＆ Command Control， 2019， 44（8）： 101-106 （in Chinese）.
16	SUTTON R S， BARTO A G. Reinforcement learning： an introduction［M］. 2nd ed. London： MIT Press， 2018.
17	HINTON G E， OSINDERO S， TEH Y W. A fast learning algorithm for deep belief nets［J］. Neural Computation， 2006， 18（7）： 1527-1554.
18	WATKINS C J C H， DAYAN P. Q-learning［J］. Machine Learning， 1992， 8（3）： 279-292.
19	RUMMERY G A， NIRANJAN M. On-line Q-learning using connectionist systems［M］. Cambridge：University of Cambridge， 1994.
20	SCHULMAN J， LEVINE S， MORITZ P， et al. Trust region policy optimization［C］∥ Proceedings of the 31st International Conference on Machine Learning， 2015： 1889-1897.
21	SCHULMAN J， WOLSKI F， DHARIWAL P， et al. Proximal policy optimization algorithms［EB/OL］. 2017： arXiv： 1707.06347. .
22	KONDA V R， TSITSIKLIS J N. OnActor-critic algorithms［J］. SIAM Journal on Control and Optimization， 2003， 42（4）： 1143-1166.
23	LILLICRAP T P， HUNT J J， PRITZEL A， et al. Continuous control with deep reinforcement learning［C］∥4th International Conference on Learning Representations， ICLR 2016-Conference Track Proceedings， 2016.
24	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518（7540）： 529-533.
25	FUJIMOTO S， VAN HOOF H， MEGER D. Addressing function approximation error in actor-critic methods［C］∥Proceedings of the 35th International Conference on Machine Learning， 2018： 1587-1596.
26	魏航. 基于强化学习的无人机空中格斗算法研究［D］. 哈尔滨：哈尔滨工业大学， 2015： 42-43.
	WEI H. Research of UCAV air combat based on reinforcemnt learning［D］. Harbin： Harbin Institute of Technology， 2015： 42-43 （in Chinese）.
27	钟友武，柳嘉润，杨凌宇，等. 自主近距空战中机动动作库及其综合控制系统［J］. 航空学报， 2008， 29（S1）： 114-121.
	ZHONG Y W， LIU J R， YANG L Y， et al. Maneuver library and integrated control system for autonomous close-in air combat［J］. Acta Aeronautica et Astronautica Sinica， 2008， 29（1）： 114-121 （in Chinese）.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

[1]	樊会涛, 闫俊. 空战体系的演变及发展趋势[J]. 航空学报, 2022, 43(10): 527397-527397.
[2]	孙聪. 从空战制胜机理演变看未来战斗机发展趋势[J]. 航空学报, 2021, 42(8): 525826-525826.
[3]	孙智孝, 杨晟琦, 朴海音, 白成超, 葛俊. 未来智能空战发展综述[J]. 航空学报, 2021, 42(8): 525799-525799.
[4]	相晓嘉, 闫超, 王菖, 尹栋. 基于深度强化学习的固定翼无人机编队协调控制方法[J]. 航空学报, 2021, 42(4): 524009-524009.
[5]	任峰, 高传强, 唐辉. 机器学习在流动控制领域的应用及发展趋势[J]. 航空学报, 2021, 42(4): 524686-524686.
[6]	黄旭, 柳嘉润, 贾晨辉, 王昭磊, 张隽. 深度确定性策略梯度算法用于无人飞行器控制[J]. 航空学报, 2021, 42(11): 524688-524688.
[7]	周凯, 魏瑞轩, 张启瑞, 丁超. 基于经验移植的自主空战对抗学习方法[J]. 航空学报, 2020, 41(S2): 724285-724285.
[8]	董一群, 艾剑良. 自主空战技术中的机动决策:进展与展望[J]. 航空学报, 2020, 41(S2): 724264-724264.
[9]	胡利平, 梁晓龙, 何吕龙, 张佳强, 任宝祥, 齐铎. 基于情景分析的航空集群决策规则库构建方法[J]. 航空学报, 2020, 41(S1): 723737-723737.
[10]	陈斌, 王江, 王阳. 战斗机嵌入式训练系统中的智能虚拟陪练[J]. 航空学报, 2020, 41(6): 523467-523467.
[11]	杨伟. 关于未来战斗机发展的若干讨论[J]. 航空学报, 2020, 41(6): 524377-524377.
[12]	刘冰雁, 叶雄兵, 周赤非, 刘必鎏. 基于改进DQN的复合模式在轨服务资源分配[J]. 航空学报, 2020, 41(5): 323630-323630.
[13]	张耀中, 许佳林, 姚康佳, 刘洁凌. 基于DDPG算法的无人机集群追击任务[J]. 航空学报, 2020, 41(10): 324000-324000.
[14]	刘冰雁, 叶雄兵, 高勇, 王新波, 倪蕾. 基于分支深度强化学习的非合作目标追逃博弈策略求解[J]. 航空学报, 2020, 41(10): 324040-324040.
[15]	张菁, 何友, 彭应宁, 李刚. 基于神经网络和人工势场的协同博弈路径规划[J]. 航空学报, 2019, 40(3): 322493-322493.

基于深度强化学习的智能空战决策与仿真

Intelligent air combat decision making and simulation based on deep reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 22

参考文献 27

相关文章 15

编辑推荐

Metrics

本文评价