基于深度强化学习的空战机动决策试验

doi:10.7527/S1000-6893.2023.28094

流体力学与飞行力学

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于深度强化学习的空战机动决策试验

章胜¹, 周攀¹, 何扬¹, 黄江涛¹(), 刘刚², 唐骥罡¹, 贾怀智³, 杜昕¹

^1.中国空气动力研究与发展中心空天技术研究所，绵阳 621000 2．中国空气动力研究与发展中心，绵阳 621000
^3.西北工业大学航空学院，西安 710000

收稿日期:2022-10-08 修回日期:2023-01-05 接受日期:2023-02-15 出版日期:2023-05-25 发布日期:2023-02-24
通讯作者: 黄江涛 E-mail:hjtcyf@163.com
基金资助:
国家自然科学基金(11902332)

Air combat maneuver decision-making test based on deep reinforcement learning

Sheng ZHANG¹, Pan ZHOU¹, Yang HE¹, Jiangtao HUANG¹(), Gang LIU², Jigang TANG¹, Huaizhi JIA³, Xin DU¹

^1.Aerospace Technology Institute，China Aerodynamics Research and Development Center，Mianyang 621000，China
^2.China Aerodynamics Research and Development Center，Mianyang 621000，China
^3.School of Aeronautics，Northwestern Polytechnical University，Xi’an 710000，China

Received:2022-10-08 Revised:2023-01-05 Accepted:2023-02-15 Online:2023-05-25 Published:2023-02-24
Contact: Jiangtao HUANG E-mail:hjtcyf@163.com
Supported by:
National Natural Science Foundation of China(11902332)

摘要/Abstract

摘要：

空战智能决策将极大改变未来战争的形态与模式。深度强化学习决策机可以挖掘飞行器潜力，是实现空战智能决策的重要技术范式，但其工程实现鲜有报道。针对基于深度强化学习的双机近距空战机动智能决策的工程实现问题，开发了适于应用的深度神经网络在线机动决策模型，发展了通过飞行控制律跟踪航迹导引决策指令的机动控制方案，并进一步开展了软硬件实现工作与人机对抗飞行试验，实现了智能空战从虚拟仿真到真实飞行的迁移。研究结果表明基于本文发展的近距空战机动决策及控制方法，智能无人机在与人类“飞行员”的对抗中能够迅速做出有利于己方的动作决策，通过机动快速占据态势优势。研究结果显示了深度神经网络智能决策技术在空战决策中的潜在应用价值。

关键词: 近距空战, 智能决策, 深度强化学习, 人机对抗, 飞行试验

Abstract:

The air combat intelligent decision-making will greatly change the form of wars. Deep reinforcement learning decision-making machine， as an important technical paradigm to realize the intelligent decision-making in air combat， can explore the potential of unmanned aircraft. However， reports on its engineering implementation are rare. Aimed at the practical implementation of the maneuver intelligent decision-making based on deep reinforcement learning in the one-to-one fighters’ close-range air combat， an online deep neural network maneuver decision-making model suitable for application is developed. The maneuver control scheme that the trajectory guidance decision-making commands are tracked with the flight control law is proposed. The corresponding software and hardware architectures are realized and the human-machine combat flight test is carried out， which achieves the transfer from virtual simulation to real flight in intelligent air combat. The research results show that， based on the close-range air combat maneuver decision-making and control method developed in this paper， the intelligent unmanned aircraft can make logical maneuver decisions quickly in favor of its own side and thus is soon in the advantageous situation by maneuver when combatting with human “pilots”. The flight test results demonstrate the potential application value of the deep neural network intelligent decision-making machine in air combat decision-making.

Key words: close-range air combat, intelligent decision-making, deep reinforcement learning, human-machine combat, flight test

中图分类号:

V249.12

章胜, 周攀, 何扬, 黄江涛, 刘刚, 唐骥罡, 贾怀智, 杜昕. 基于深度强化学习的空战机动决策试验[J]. 航空学报, 2023, 44(10): 128094-128094.

Sheng ZHANG, Pan ZHOU, Yang HE, Jiangtao HUANG, Gang LIU, Jigang TANG, Huaizhi JIA, Xin DU. Air combat maneuver decision-making test based on deep reinforcement learning[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(10): 128094-128094.

图/表 17

图 1

图 2

图 3

图 4

图 5

表1

态势评估函数中的参数取值

参数	取值	参数	取值
$V o p t$	30	$ω φ$	0.6
$h o p t$	100	$ω V$	0.1
$h 0$	1 000	$ω h$	0.2
$d o p t$	20	$ω d$	0.1
$d 0$	200

表1

图 6

图 7

图 8

图 9

图 10

图 11

图 12

图 13

图 14

图 15

图 16

参考文献 32

1	樊会涛，闫俊. 空战体系的演变及发展趋势［J］. 航空学报， 2022， 43（10）： 527397.
	FAN H T， YAN J. Evolution and development trend of air combat system［J］. Acta Aeronautica et Astronautica Sinica， 2022， 43（10）： 527397 （in Chinese）.
2	孙智孝，杨晟琦，朴海音，等. 未来智能空战发展综述［J］. 航空学报， 2021， 42（8）： 525799.
	SUN Z X， YANG S Q， PIAO H Y， et al. A survey of air combat artificial intelligence ［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525799 （in Chinese）.
3	孙聪. 从空战制胜机理演变看未来战斗机发展趋势［J］. 航空学报， 2021， 42（8）： 525826.
	SUN C. Development trend of future fighter： a review of evolution of winning mechanism in air combat［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525826 （in Chinese）.
4	NICHOLS S O. 21st century air-to-air short range weapon requirementsf： AU/ACSC/210/1998-04 ［R］. Alabama： Maxwell Air Force Base， 1998.
5	董一群，艾剑良. 自主空战技术中的机动决策：进展与展望［J］. 航空学报， 2020， 41（）： 724264.
	DONG Y Q， AI J L. Decision making in autonomous air combat： review and prospects［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（Sup 2）： 724264 （in Chinese）.
6	BURGIN G H. OWENS A J. An adaptive maneuvering logic computer program for the simulation of one-on-one air-to-air combat ［R］. Washington D. C.： NASA. 1975.
7	ISAACS R. Differential games： A mathematical theory with applications to warfare and pursuit， control and optimization［M］. New York： Wiley， 1965
8	薛羽，庄毅，张友益，等. 基于启发式自适应离散差分进化算法的多UCAV协同干扰空战决策［J］. 航空学报， 2013， 34（2）： 343-351.
	XUE Y， ZHUANG Y， ZHANG Y Y， et al. Multiple UCAV cooperative jamming air combat decision making based on heuristic self-adaptive discrete differential evolution algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2013， 34（2）： 343-351 （in Chinese）.
9	RODIN E Y， LIROV Y， MITTNIK S， et al. Artificial intelligence in air combat games［J］. Computers & Mathematics With Applications， 1987， 13（1-3）： 261-274.
10	ERNEST N， CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions［J］. Journal of Defense Management， 2016， 6（1）， doi： 10.4172/2167-0374.1000144 .
11	Defense Advanced Research Projects Agency. AlphaDogfight trials go virtual for final event ［EB/OL］. （2020-08-07）［2021-03-10］. ：.
12	POPE A P， IDE J S， MIĆOVIĆ D， et al. Hierarchical reinforcement learning for air-to-air combat［C］∥2021 International Conference on Unmanned Aircraft Systems （ICUAS）. Piscataway： IEEE Press， 2021： 275-284.
13	杜子亮. DARPA“空战进化”项目开启良好开端［J］. 国际航空， 2020（9）： 20-22.
	DU Z L. Good start for DARPA’s air combat evolution program［J］. International Aviation， 2020（9）： 20-22 （in Chinese）.
14	李磊，蒋琪，王彤. 美国DARPA空战演变项目分析［J］. 飞航导弹， 2020（4）： 52-58.
	LI L， JIANG Q， WANG T. Analysis of DARPA air combat evolution project in America［J］. Aerodynamic Missile Journal， 2020（4）： 52-58 （in Chinese）.
15	左家亮，杨任农，张滢，等. 基于启发式强化学习的空战机动智能决策［J］. 航空学报， 2017， 38（10）： 321168.
	ZUO J L， YANG R N， ZHANG Y， et al. Intelligent decision-making in air combat maneuvering based on heuristic reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2017， 38（10）： 321168 （in Chinese）.
16	张强，杨任农，俞利新，等. 基于Q-network强化学习的超视距空战机动决策［J］. 空军工程大学学报（自然科学版）， 2018， 19（6）： 8-14.
	ZHANG Q， YANG R N， YU L X， et al. BVR air combat maneuvering decision by using Q-network reinforcement learning［J］. Journal of Air Force Engineering University （Natural Science Edition）， 2018， 19（6）： 8-14 （in Chinese）.
17	张耀中，许佳林，姚康佳，等. 基于DDPG算法的无人机集群追击任务［J］. 航空学报， 2020， 41（10）： 324000.
	ZHANG Y Z， XU J L， YAO K J， et al. Pursuit missions for UAV swarms based on DDPG algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（10）： 324000 （in Chinese）.
18	施伟，冯旸赫，程光权，等. 基于深度强化学习的多机协同空战方法研究［J］. 自动化学报， 2021， 47（7）： 1610-1623.
	SHI W， FENG Y H， CHENG G Q， et al. Research on multi-aircraft cooperative air combat method based on deep reinforcement learning［J］. Acta Automatica Sinica， 2021， 47（7）： 1610-1623 （in Chinese）.
19	王壮. 近距空战飞行器智能机动决策生成研究［D］. 成都：四川大学， 2021.
	WANG Z. Research on intelligent maneuver decision generation of within visual range air combat［D］. Chengdu： Sichuan University， 2021 （in Chinese）.
20	周攀，黄江涛，章胜，等. 基于深度强化学习的智能空战决策与仿真［J］. 航空学报， 2023， 44（4）： 126731.
	ZHOU P， HUANG J T， ZHANG S， et al. Intelligent air combat decision and simulation based on deep reinforcement learning ［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（4）： 126731 （in Chinese）.
21	符小卫，徐哲，朱金冬，等. 基于PER-MATD3的多无人机攻防对抗机动决策研究［J］. 航空学报， doi： 10.7527/S1000-6893.2022.27083 .
	FU X W， XU Z， ZHU J D， et al. Research on maneuvering decision-making of multi-UAV attack-defence confrontation based on PER-MATD3［J］. Acta Aeronautica et Astronautica Sinica， doi： 10.7527/S1000-6893.2022.27083 （in Chinese）.
22	高飞. 人工智能持续推进DARPA“空战演进”项目将迎来新进展［N］. 中国航空报， 2021-08-31（A09）.
	GAO F. Continuous promotion of artificial intelligence， DARPA “Air Combat Evolution” project will usher in new progress ［N］. China Aviation News， 2021-08-31（A09）（in Chinese）.
23	杨伟. 关于未来战斗机发展的若干讨论［J］. 航空学报， 2020， 41（6）： 524377.
	YANG W. Development of future fighters［J］. Acta Aeronautica et Astronautica Sinica， 2020， 41（6）： 524377 （in Chinese）.
24	吴森堂，费玉华. 飞行控制系统［M］. 北京：北京航空航天大学出版社， 2005： 8-13.
	WU S T， FEI Y H. Flight control［M］. Beijing： Beijing University of Aeronautics & Astronautics Press， 2005： 8-13 （in Chinese）.
25	王栋，寇雅楠，胡涛. 智能空战对抗训练关键技术研究［M］. 北京：电子工业出版社， 2021.
	WANG D， KOU Y N， HU T. Research on key technologies of intelligent air combat countermeasure training［M］. Beijing： Publishing House of Electronics Industry， 2021 （in Chinese）.
26	李银通，韩统，孙楚，等. 基于逆强化学习的空战态势评估函数优化方法［J］. 火力与指挥控制， 2019， 44（8）： 101-106.
	LI Y T， HAN T， SUN C， et al. An optimization method of air combat situation assessment function based on inverse reinforcement learning［J］. Fire Control & Command Control， 2019， 44（8）： 101-106 （in Chinese）.
27	赵冬斌，邵坤，朱圆恒，等. 深度强化学习综述：兼论计算机围棋的发展［J］. 控制理论与应用， 2016， 33（6）： 701-717.
	ZHAO D B， SHAO K， ZHU Y H， et al. Review of deep reinforcement learning and discussions on the development of computer Go［J］. Control Theory & Applications， 2016， 33（6）： 701-717 （in Chinese）.
28	SILVER D. Tutorial： Deep reinforcement learning， Google DeepMind， 2020［R/OL］. ［2022-10-31］.. .
29	FUJIMOTO S， VAN HOOF H， MEGER D. Addressing function approximation error in actor-critic methods［DB/OL］. prepint arXiv：， 2018.
30	SCHAUL T， QUAN J， ANTONOGLOU I， et al. Prioritized experience replay ［DB/OL］. prepint arXiv： arXiv：， 2015.
31	钟友武，柳嘉润，杨凌宇，等. 自主近距空战中机动动作库及其综合控制系统［J］. 航空学报， 2008， 29（）： 114-121.
	ZHONG Y W， LIU J R， YANG L Y， et al. Maneuver library and integrated control system for autonomous close-in air combat［J］. Acta Aeronautica et Astronautica Sinica， 2008， 29（Sup 1）： 114-121 （in Chinese）.
32	STEVENS B L， LEWIS F L， JOHNSON E N. Aircraft control and simulation： Dynamics， controls design， and autonomous systems［M］. 3rd ed. New York： Wiley-Blackwell， 2015.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

[1]	聂博文, 王亮权, 黄志银, 何龙, 杨仕鹏, 颜鸿涛, 章贵川. 复合式高速无人直升机飞行动力学建模与控制策略设计[J]. 航空学报, 2024, 45(9): 529848-529848.
[2]	张鸿林, 罗建军, 马卫华. 基于机器学习的航天器规避目标威胁博弈决策[J]. 航空学报, 2024, 45(8): 329136-329136.
[3]	童晟翔, 史志伟, 耿玺, 王力爽, 孙志坤, 陈其昌. 组合式仿枫树子飞行器与空中分体技术[J]. 航空学报, 2024, 45(6): 629590-629590.
[4]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[5]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.
[6]	李文龙, 吴波, 谢帅. 有起落架布置的翼身整体结构机翼载荷测量技术[J]. 航空学报, 2024, 45(1): 229525-229525.
[7]	倪炜霖, 王永海, 徐聪, 赤丰华, 梁海朝. 基于强化学习的高超飞行器协同博弈制导方法[J]. 航空学报, 2023, 44(S2): 729400-729400.
[8]	王雪鉴, 文永明, 石晓荣, 张宁宁, 刘洁玺. 多智能体多耦合任务混合式智能决策架构设计[J]. 航空学报, 2023, 44(S2): 729770-729770.
[9]	马金毅, 王灿, 薛涛, 艾剑良, 董一群. 空战格斗飞行机动数据库建立及应用[J]. 航空学报, 2023, 44(S1): 727538-727538.
[10]	高锡珍, 汤亮, 黄煌. 深度强化学习技术在地外探测自主操控中的应用与挑战[J]. 航空学报, 2023, 44(6): 26762-026762.
[11]	周攀, 黄江涛, 章胜, 刘刚, 舒博文, 唐骥罡. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报, 2023, 44(4): 126731-126731.
[12]	王琛, 惠倩倩, 张帆. 水空跨域多模态共轴无人机设计[J]. 航空学报, 2023, 44(21): 529047-529047.
[13]	杜昕, 朱喆, 胡芳芳, 黄江涛, 刘刚, 章胜, 单恩光, 唐骥罡. 空中无人加油自主对接导航制导与控制[J]. 航空学报, 2023, 44(20): 628827-628827.
[14]	宋亚辉, 樊高宇, 瞿丽霞, 张跃林, 徐悦, 韩硕. 航空器声爆飞行试验测量技术研究进展[J]. 航空学报, 2023, 44(2): 626186-626186.
[15]	朱祥维, 沈丹, 肖凯, 马岳鑫, 廖祥, 古富强, 余芳文, 高柯夫, 刘经南. 类脑导航的机理、算法、实现与展望[J]. 航空学报, 2023, 44(19): 28569-028569.

基于深度强化学习的空战机动决策试验

Air combat maneuver decision-making test based on deep reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 32

相关文章 15

编辑推荐

Metrics

本文评价