基于时空信息融合的多机协同空战决策方法

doi:10.7527/S1000-6893.2025.32633

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 |

基于时空信息融合的多机协同空战决策方法

廉云霄¹, 李霓¹^,²(), 谢锋¹^,³, 周攀¹^,⁴, 董长印¹^,²

^1.西北工业大学航空学院，西安 710072
^2.飞行器基础布局全国重点实验室，西安 710072
^3.航空工业成都飞机设计研究所，成都 610041
^4.中国空气动力研究与发展中心空天技术研究所，绵阳 621000

收稿日期:2025-07-28 修回日期:2025-08-29 接受日期:2025-10-09 出版日期:2025-10-20 发布日期:2025-10-17
通讯作者: 李霓 E-mail:lini@nwpu.edu.cn
基金资助:
国家自然科学基金(52372398);国家自然科学基金(62003272);国家自然科学基金(52302405)

A multi-UAV cooperative air combat decision-making method based on spatial-temporal information fusion

Yunxiao LIAN¹, Ni LI¹^,²(), Feng XIE¹^,³, Pan ZHOU¹^,⁴, Changyin DONG¹^,²

^1.School of Aeronautics，Northwestern Polytechnical University，Xi’an 710072，China
^2.National Key Laboratory of Aircraft Configuration Design，Xi’an 710072，China
^3.AVIC Chengdu Aircraft Design and Research Institute，Chengdu 610041，China
^4.Institute of Space Technology，China Aerodynamics Research and Development Center，Mianyang 621000，China

Received:2025-07-28 Revised:2025-08-29 Accepted:2025-10-09 Online:2025-10-20 Published:2025-10-17
Contact: Ni LI E-mail:lini@nwpu.edu.cn
Supported by:
National Natural Science Foundation of China(52372398)

摘要/Abstract

摘要：

多智能体强化学习是当前实现多机自主协同空战最具潜力的方法之一。然而现有方法受限于端到端网络结构，在空战中存在多机协同性差和难以反映决策动机的关键性问题。为此，提出一种时空信息融合的多机协同空战决策方法以提升多机空战的协同性与可解释性。首先，设计了一种基于图注意力机制的空间信息融合方法聚合智能体局部观测并形成全局态势评估，增强了全连接评价网络信息融合效率和训练效率。其次，设计了一种交叉注意力和门控循环单元的时空信息融合方法聚合敌友方单元空间信息和时序信息，为策略网络融合协同性特征。最后，结合强化学习构建了时空信息融合的多机协同空战决策算法，并在高保真空战环境下进行了验证。实验结果表明：所提方法具有较强的协同性和决策动机的可解释性。

关键词: 多机协同空战, 多智能体强化学习, 时空信息融合, 图注意力, 交叉注意力

Abstract:

Multi-agent reinforcement learning is currently one of the most promising methods for achieving autonomous cooperative air combat among multiple aircraft. However， existing methods are constrained by the end-to-end network architecture， facing critical issues such as poor multi-UAV coordination and difficulty in reflecting decision-making motivation in air combat. To address these issues， this paper proposes a multi-UAV cooperative air combat decision-making method based on spatial-temporal information fusion to improve the cooperation and interpretability of multi-aircraft air combat. First， a spatial information fusion method based on graph attention mechanism is designed to aggregate local observations of agents and form global situation assessment， which enhances the information fusion efficiency and training efficiency of the fully connected evaluation network. Second， a spatial-temporal information fusion method combining cross-attention and gated recurrent unit is developed to aggregate spatial and temporal information of enemy and friendly units， fusing coordination features for the policy network. Finally， a spatial-temporal information fusion-based multi-UAV cooperative air combat decision-making algorithm is constructed by integrating reinforcement learning and validated in a high-fidelity air combat environment. Experimental results show that the proposed method exhibits strong coordination and interpretability of decision-making motivation.

Key words: multi-UAV cooperative air combat, multi-agent reinforcement learning, spatial-temporal information fusion, graph attention, cross-attention

中图分类号:

V249.12

廉云霄, 李霓, 谢锋, 周攀, 董长印. 基于时空信息融合的多机协同空战决策方法[J]. 航空学报, 2026, 47(6): 332633.

Yunxiao LIAN, Ni LI, Feng XIE, Pan ZHOU, Changyin DONG. A multi-UAV cooperative air combat decision-making method based on spatial-temporal information fusion[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(6): 332633.

图/表 17

图 1

图 2

图 3

图 4

图 5

表1

表2

表3

网络更新参数

参数	符号	数值
学习率	$l r$	3×10^-4
迭代更新轮数	ppo_epochs	4
剪切系数	clip_param	0.2
熵系数	$β$	0.01
梯度裁剪系数	clipnorm	2

表3

表4

经验回放池参数

参数	符号	数值
容量	$B m a x$	3 000
批量数	batch_nums	5
时间序列长度	chunk_length	8
折扣因子	$γ$	0.99
GAE缩放因子	$λ$	0.95

表4

表5

图 6

图 7

表6

图 8

图 9

表7

图 10

参考文献 24

[1]	樊会涛，闫俊. 空战体系的演变及发展趋势［J］. 航空学报， 2022， 43（10）： 527397.
	FAN H T， YAN J. Evolution and development trend of air combat system［J］. Acta Aeronautica et Astronautica Sinica， 2022， 43（10）： 527397 （in Chinese）.
[2]	孙智孝，杨晟琦，朴海音，等. 未来智能空战发展综述［J］. 航空学报， 2021， 42（8）： 525799.
	SUN Z X， YANG S Q， PIAO H Y， et al. A survey of air combat artificial intelligence［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（8）： 525799 （in Chinese）.
[3]	DEMAY C R， WHITE E L， DUNHAM W D， et al. AlphaDogfight trials： Bringing autonomy to air combat［J］. Johns Hopkins APL Technical Digest， 2022， 36（2）： 154-163.
[4]	POPE A P， IDE J S， MIĆOVIĆ D， et al. Hierarchical reinforcement learning for air combat at DARPA’s AlphaDogfight trials［J］. IEEE Transactions on Artificial Intelligence， 2023， 4（6）： 1371-1385.
[5]	周攀，黄江涛，章胜，等. 基于深度强化学习的智能空战决策与仿真［J］. 航空学报， 2023， 44（4）： 126731.
	ZHOU P， HUANG J T， ZHANG S， et al. Intelligent air combat decision making and simulation based on deep reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（4）： 126731 （in Chinese）.
[6]	周攀，李霓，黄江涛，等. 非完备信息下无人机近距博弈自主决策［J］.航空学报， 2025， 46（S1）： 732215.
	ZHOU P， LI N， HUANG J T， et al. Autonomous decision-making in close-range game under imperfect information for unmanned aerial vehicles ［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（S1）： 732215 （in Chinese）.
[7]	WANG D H， ZHANG J D， YANG Q M， et al. An autonomous attack decision-making method based on hierarchical virtual Bayesian reinforcement learning［J］. IEEE Transactions on Aerospace and Electronic Systems， 2024， 60（5）： 7075-7088.
[8]	DE MARCO A， D’ONZA P M， MANFREDI S. A deep reinforcement learning control approach for high-performance aircraft［J］. Nonlinear Dynamics， 2023， 111（18）： 17037-17077.
[9]	SALDIRAN E， HASANZADE M， INALHAN G， et al. Explainability of AI-driven air combat agent［C］∥2023 IEEE Conference on Artificial Intelligence （CAI）. Piscataway： IEEE Press， 2023： 85-86.
[10]	杨书恒，张栋，熊威，等. 基于可解释性强化学习的空战机动决策方法［J］. 航空学报， 2024， 45（18）： 329922.
	YANG S H， ZHANG D， XIONG W， et al. Decision-making method for air combat maneuver based on explainable reinforcement learning［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（18）： 329922 （in Chinese）.
[11]	SELMONAJ A， SZEHR O， DEL RIO G， et al. Hierarchical multi-agent reinforcement learning for air combat maneuvering［C］∥2023 International Conference on Machine Learning and Applications （ICMLA）. Piscataway： IEEE Press， 2023： 1031-1038.
[12]	李文韬，方峰，王振亚，等. 引入混合超网络改进MADDPG的双机编队空战自主机动决策［J］. 航空学报， 2024， 45（17）： 529460.
	LI W T， FANG F， WANG Z Y， et al. Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（17）： 529460 （in Chinese）.
[13]	XU X J， WANG Y F， GUO X， et al. Multi-UAV air combat cooperative game based on virtual opponent and value attention decomposition policy gradient［J］. Expert Systems with Applications， 2025， 267： 126069.
[14]	ZHOU Y M， YANG F， ZHANG C Y， et al. Cooperative decision-making algorithm with efficient convergence for UCAV formation in beyond-visual-range air combat based on multi-agent reinforcement learning［J］. Chinese Journal of Aeronautics， 2024， 37（8）： 311-328.
[15]	YAN Z H， LIANG X L， HOU Y Q， et al. A sample selection mechanism for multi-UCAV air combat policy training using multi-agent reinforcement learning［J］. Chinese Journal of Aeronautics， 2025， 38（6）： 103391.
[16]	JIANG F L， XU M Q， LI Y Q， et al. Short-range air combat maneuver decision of UAV swarm based on multi-agent Transformer introducing virtual objects［J］. Engineering Applications of Artificial Intelligence， 2023， 123： 106358.
[17]	WU J H， ZHANG N， LI D Y， et al. A context-aware feature fusion method for multi-UAV cooperative air combat［J］. IEEE Transactions on Intelligent Transportation Systems， 2025， 26（5）： 7197-7210.
[18]	BERNDT J. JSBSim： An open source flight dynamics model in C++［C］∥AIAA Modeling and Simulation Technologies Conference and Exhibit. Reston： AIAA， 2004.
[19]	YU C， VELU A， VINITSKY E， et al. The surprising effectiveness of PPO in cooperative multi-agent games［J］. Advances in Neural Information Processing Systems， 2022， 35： 24611-24624.
[20]	李霓，廉云霄，周攀，等. 面向智能空战的深度强化学习技术综述［J］. 航空工程进展， 2025， 16（3）： 1-16.
	LI N， LIAN Y X， ZHOU P， et al. A survey of deep reinforcement learning technologies for intelligent air combat［J］. Advances in Aeronautical Science and Engineering， 2025， 16（3）： 1-16 （in Chinese）.
[21]	VELIČKOVIĆ P， CUCURULL G， CASANOVA A， et al. Graph attention networks［DB/OL］. arXiv preprint： 1710.10903， 2017.
[22]	DEY R， SALEM F M. Gate-variants of gated recurrent unit （GRU） neural networks［C］∥2017 IEEE 60th International Midwest Symposium on Circuits and Systems （MWSCAS）. Piscataway： IEEE Press， 2017： 1597-1600.
[23]	ZHANG R Z， XU Z L， MA C D， et al. A survey on self-play methods in reinforcement learning［DB/OL］. arXiv preprint： 2408.01072， 2024.
[24]	PANG J H， HE J L， MOHAMED N， et al. A hierarchical reinforcement learning framework for multi-UAV combat using leader-follower strategy［DB/OL］. arXiv preprint： 2501.13132， 2025.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

网络类型	层数/维度	特征
交叉注意力	1/32
多层感知器	2/128	ReLU
门控循环单元	1/128
多层感知器	2/128	ReLU
动作输出层	四维离散动作	正交初始化增益0.01

网络类型	层数/维度	特征
图注意力	1/39
多层感知器	2/128	ReLU
值输出层	1/1

参数	初始值
经度/（°）	119.9~120.1
纬度/（°）	59.9~60.1
高度/km	6.1~9.0
航向/（°）	0~360

算法	胜数	负数	击杀率/%	被击杀率/%
ST-MAPPO vs R-MAPPO	99	1	96.3	14.7
ST-MAPPO vs GAT-MAPPO	98	2	94	19

算法	对局数量	胜数	负数	击杀率/%	被击杀率/%
ST-MAPPO vs T-MAPPO	3V3	100	0	95.3	9.7
ST-MAPPO vs T-MAPPO	4V4	99	1	95.8	18
ST-MAPPO vs H-MAPPO	3V3	94	6	94.3	29.7
ST-MAPPO vs H-MAPPO	4V4	91	9	94.3	42.8

基于时空信息融合的多机协同空战决策方法

A multi-UAV cooperative air combat decision-making method based on spatial-temporal information fusion

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 24

相关文章 5

编辑推荐

Metrics

本文评价

[1]	刘嘉文, 王明振, 欧阳文轩, 虞建, 刘学军, 吕宏强. 针对高阶间断伽辽金数值格式的Gibbs现象智能去噪方法[J]. 航空学报, 2024, 45(14): 129323-129323.
[2]	符小卫, 徐哲, 朱金冬, 王楠. 基于PER-MATD3的多无人机攻防对抗机动决策[J]. 航空学报, 2023, 44(7): 327083-327083.
[3]	高树一, 林德福, 郑多, 胡馨予. 针对集群攻击的飞行器智能协同拦截策略[J]. 航空学报, 2023, 44(18): 328301-328301.
[4]	符小卫, 王辉, 徐哲. 基于DE-MADDPG的多无人机协同追捕策略[J]. 航空学报, 2022, 43(5): 325311-325311.
[5]	王昱, 章卫国, 傅莉, 黄得刚, 李勇. 基于改进证据网络的空战动态态势估计方法[J]. 航空学报, 2015, 36(12): 3896-3909.