电子电气工程与控制

基于时空信息融合的多机协同空战决策方法

  • 廉云霄 ,
  • 李霓 ,
  • 谢锋 ,
  • 周攀 ,
  • 董长印
展开
  • 1.西北工业大学 航空学院,西安 710072
    2.飞行器基础布局全国重点实验室,西安 710072
    3.航空工业成都飞机设计研究所,成都 610041
    4.中国空气动力研究与发展中心 空天技术研究所,绵阳 621000
.E-mail: lini@nwpu.edu.cn

收稿日期: 2025-07-28

  修回日期: 2025-08-29

  录用日期: 2025-10-09

  网络出版日期: 2025-10-17

基金资助

国家自然科学基金(52372398);国家自然科学基金(62003272);国家自然科学基金(52302405)

A multi-UAV cooperative air combat decision-making method based on spatial-temporal information fusion

  • Yunxiao LIAN ,
  • Ni LI ,
  • Feng XIE ,
  • Pan ZHOU ,
  • Changyin DONG
Expand
  • 1.School of Aeronautics,Northwestern Polytechnical University,Xi’an 710072,China
    2.National Key Laboratory of Aircraft Configuration Design,Xi’an 710072,China
    3.AVIC Chengdu Aircraft Design and Research Institute,Chengdu 610041,China
    4.Institute of Space Technology,China Aerodynamics Research and Development Center,Mianyang 621000,China
E-mail: lini@nwpu.edu.cn

Received date: 2025-07-28

  Revised date: 2025-08-29

  Accepted date: 2025-10-09

  Online published: 2025-10-17

Supported by

National Natural Science Foundation of China(52372398)

摘要

多智能体强化学习是当前实现多机自主协同空战最具潜力的方法之一。然而现有方法受限于端到端网络结构,在空战中存在多机协同性差和难以反映决策动机的关键性问题。为此,提出一种时空信息融合的多机协同空战决策方法以提升多机空战的协同性与可解释性。首先,设计了一种基于图注意力机制的空间信息融合方法聚合智能体局部观测并形成全局态势评估,增强了全连接评价网络信息融合效率和训练效率。其次,设计了一种交叉注意力和门控循环单元的时空信息融合方法聚合敌友方单元空间信息和时序信息,为策略网络融合协同性特征。最后,结合强化学习构建了时空信息融合的多机协同空战决策算法,并在高保真空战环境下进行了验证。实验结果表明:所提方法具有较强的协同性和决策动机的可解释性。

本文引用格式

廉云霄 , 李霓 , 谢锋 , 周攀 , 董长印 . 基于时空信息融合的多机协同空战决策方法[J]. 航空学报, 2026 , 47(6) : 332633 -332633 . DOI: 10.7527/S1000-6893.2025.32633

Abstract

Multi-agent reinforcement learning is currently one of the most promising methods for achieving autonomous cooperative air combat among multiple aircraft. However, existing methods are constrained by the end-to-end network architecture, facing critical issues such as poor multi-UAV coordination and difficulty in reflecting decision-making motivation in air combat. To address these issues, this paper proposes a multi-UAV cooperative air combat decision-making method based on spatial-temporal information fusion to improve the cooperation and interpretability of multi-aircraft air combat. First, a spatial information fusion method based on graph attention mechanism is designed to aggregate local observations of agents and form global situation assessment, which enhances the information fusion efficiency and training efficiency of the fully connected evaluation network. Second, a spatial-temporal information fusion method combining cross-attention and gated recurrent unit is developed to aggregate spatial and temporal information of enemy and friendly units, fusing coordination features for the policy network. Finally, a spatial-temporal information fusion-based multi-UAV cooperative air combat decision-making algorithm is constructed by integrating reinforcement learning and validated in a high-fidelity air combat environment. Experimental results show that the proposed method exhibits strong coordination and interpretability of decision-making motivation.

参考文献

[1] 樊会涛, 闫俊. 空战体系的演变及发展趋势[J]. 航空学报202243(10): 527397.
  FAN H T, YAN J. Evolution and development trend of air combat system[J]. Acta Aeronautica et Astronautica Sinica202243(10): 527397 (in Chinese).
[2] 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报202142(8): 525799.
  SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525799 (in Chinese).
[3] DEMAY C R, WHITE E L, DUNHAM W D, et al. AlphaDogfight trials: Bringing autonomy to air combat[J]. Johns Hopkins APL Technical Digest202236(2): 154-163.
[4] POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air combat at DARPA’s AlphaDogfight trials[J]. IEEE Transactions on Artificial Intelligence20234(6): 1371-1385.
[5] 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报202344(4): 126731.
  ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(4): 126731 (in Chinese).
[6] 周攀, 李霓, 黄江涛, 等. 非完备信息下无人机近距博弈自主决策[J].航空学报202546(S1): 732215.
  ZHOU P, LI N, HUANG J T, et al. Autonomous decision-making in close-range game under imperfect information for unmanned aerial vehicles [J]. Acta Aeronautica et Astronautica Sinica202546(S1): 732215 (in Chinese).
[7] WANG D H, ZHANG J D, YANG Q M, et al. An autonomous attack decision-making method based on hierarchical virtual Bayesian reinforcement learning[J]. IEEE Transactions on Aerospace and Electronic Systems202460(5): 7075-7088.
[8] DE MARCO A, D’ONZA P M, MANFREDI S. A deep reinforcement learning control approach for high-performance aircraft[J]. Nonlinear Dynamics2023111(18): 17037-17077.
[9] SALDIRAN E, HASANZADE M, INALHAN G, et al. Explainability of AI-driven air combat agent[C]∥2023 IEEE Conference on Artificial Intelligence (CAI). Piscataway: IEEE Press, 2023: 85-86.
[10] 杨书恒, 张栋, 熊威, 等. 基于可解释性强化学习的空战机动决策方法[J]. 航空学报202445(18): 329922.
  YANG S H, ZHANG D, XIONG W, et al. Decision-making method for air combat maneuver based on explainable reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202445(18): 329922 (in Chinese).
[11] SELMONAJ A, SZEHR O, DEL RIO G, et al. Hierarchical multi-agent reinforcement learning for air combat maneuvering[C]∥2023 International Conference on Machine Learning and Applications (ICMLA). Piscataway: IEEE Press, 2023: 1031-1038.
[12] 李文韬, 方峰, 王振亚, 等. 引入混合超网络改进MADDPG的双机编队空战自主机动决策[J]. 航空学报202445(17): 529460.
  LI W T, FANG F, WANG Z Y, et al. Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network[J]. Acta Aeronautica et Astronautica Sinica202445(17): 529460 (in Chinese).
[13] XU X J, WANG Y F, GUO X, et al. Multi-UAV air combat cooperative game based on virtual opponent and value attention decomposition policy gradient[J]. Expert Systems with Applications2025267: 126069.
[14] ZHOU Y M, YANG F, ZHANG C Y, et al. Cooperative decision-making algorithm with efficient convergence for UCAV formation in beyond-visual-range air combat based on multi-agent reinforcement learning[J]. Chinese Journal of Aeronautics202437(8): 311-328.
[15] YAN Z H, LIANG X L, HOU Y Q, et al. A sample selection mechanism for multi-UCAV air combat policy training using multi-agent reinforcement learning[J]. Chinese Journal of Aeronautics202538(6): 103391.
[16] JIANG F L, XU M Q, LI Y Q, et al. Short-range air combat maneuver decision of UAV swarm based on multi-agent Transformer introducing virtual objects[J]. Engineering Applications of Artificial Intelligence2023123: 106358.
[17] WU J H, ZHANG N, LI D Y, et al. A context-aware feature fusion method for multi-UAV cooperative air combat[J]. IEEE Transactions on Intelligent Transportation Systems202526(5): 7197-7210.
[18] BERNDT J. JSBSim: An open source flight dynamics model in C++[C]∥AIAA Modeling and Simulation Technologies Conference and Exhibit. Reston: AIAA, 2004.
[19] YU C, VELU A, VINITSKY E, et al. The surprising effectiveness of PPO in cooperative multi-agent games[J]. Advances in Neural Information Processing Systems202235: 24611-24624.
[20] 李霓, 廉云霄, 周攀, 等. 面向智能空战的深度强化学习技术综述[J]. 航空工程进展202516(3): 1-16.
  LI N, LIAN Y X, ZHOU P, et al. A survey of deep reinforcement learning technologies for intelligent air combat[J]. Advances in Aeronautical Science and Engineering202516(3): 1-16 (in Chinese).
[21] VELI?KOVI? P, CUCURULL G, CASANOVA A, et al. Graph attention networks[DB/OL]. arXiv preprint: 1710.10903, 2017.
[22] DEY R, SALEM F M. Gate-variants of gated recurrent unit (GRU) neural networks[C]∥2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS). Piscataway: IEEE Press, 2017: 1597-1600.
[23] ZHANG R Z, XU Z L, MA C D, et al. A survey on self-play methods in reinforcement learning[DB/OL]. arXiv preprint: 2408.01072, 2024.
[24] PANG J H, HE J L, MOHAMED N, et al. A hierarchical reinforcement learning framework for multi-UAV combat using leader-follower strategy[DB/OL]. arXiv preprint: 2501.13132, 2025.
文章导航

/