第二届空天前沿大会/第二十七届中国科协年会优秀论文

非完备信息下无人机近距博弈自主决策

  • 周攀 ,
  • 李霓 ,
  • 黄江涛 ,
  • 杨青林 ,
  • 廉云霄
展开
  • 1.西北工业大学 航空学院,西安 710072
    2.中国空气动力研究与发展中心 空天技术研究所,绵阳 621000
    3.北京航空航天大学 航空科学与工程学院,北京 100191
.E-mail: hjtcyfx@163.com

收稿日期: 2025-05-09

  修回日期: 2025-05-12

  录用日期: 2025-05-18

  网络出版日期: 2025-06-10

基金资助

国家自然科学基金(52372398)

Autonomous decision-making in close-range game under imperfect information for unmanned aerial vehicles

  • Pan ZHOU ,
  • Ni LI ,
  • Jiangtao HUANG ,
  • Qinglin YANG ,
  • Yunxiao LIAN
Expand
  • 1.School of Aeronautics,Northwestern Polytechnical University,Xi’an 710072,China
    2.Institute of Space Technology,China Aerodynamics Research and Development Center,Mianyang 621000,China
    3.School of Aeronautic Science and Engineering,Beihang University,Beijing 100191,China
E-mail: hjtcyfx@163.com

Received date: 2025-05-09

  Revised date: 2025-05-12

  Accepted date: 2025-05-18

  Online published: 2025-06-10

Supported by

National Natural Science Foundation of China(52372398)

摘要

随着计算机科学、自动控制理论、飞行器设计等学科的融合发展,无人机近距博弈自主决策成为当前无人机领域关键性技术难题之一。针对非完备信息下的无人机近距博弈自主决策问题,提出了一种基于预训练Efficientero算法的无人机近距博弈自主决策方法。首先,实现了一种基于四元数理论的无人机三自由度动力学模型求解方法,并根据该方法建立了三自由度无人机近距博弈环境模型。其次,基于深度神经网络建立了面向多维连续状态输入、多维离散动作输出的无人机近距博弈自主决策模型。在此基础上,提出了一种基于预训练EfficientZero算法的近距博弈决策模型优化方法。然后,建立了非完备信息下目标机动轨迹预测模型。最后,开展了无人机近距博弈仿真试验。

本文引用格式

周攀 , 李霓 , 黄江涛 , 杨青林 , 廉云霄 . 非完备信息下无人机近距博弈自主决策[J]. 航空学报, 2025 , 46(S1) : 732215 -732215 . DOI: 10.7527/S1000-6893.2025.32215

Abstract

With the development of computer science, automatic control theory, aircraft design and other disciplines, autonomous decision-making of Unmanned Aerial Vehicle (UAV) in close-range game has become one of the key technical problems in the field of UAV. Aimed at the autonomous decision-making problem of UAV in close-range game under incomplete information, this paper proposes an autonomous decision-making method of UAV in close-range game based on pre-trained EfficientZero algorithm. Firstly, a three-degree-of-freedom dynamic model of UAV based on quaternion theory is implemented, and a three-degree-of-freedom close-range game environment model of UAV is established according to this method. Secondly, based on deep neural network, an autonomous decision-making model of UAV close-range game for multi-dimensional continuous state input and multi-dimensional discrete action output is established. On this basis, an optimization method of close-range game decision model based on pre-trained EfficientZero algorithm is proposed. Then, the prediction model of target maneuvering trajectory under incomplete information is established. Finally, the close-range game simulation experiment of UAV is carried out.

参考文献

[1] 杨伟. 关于未来战斗机发展的若干讨论[J]. 航空学报202041(6): 524377.
  YANG W. Development of future fighters[J]. Acta Aeronautica et Astronautica Sinica202041(6): 524377 (in Chinese).
[2] ARTHUR H M. Counter-drone systems[R]. 2nd Edition. Center for the Study of the Drone, 2019.
[3] 孙昭, 何广军, 李广剑. 美军反无人机技术研究[J]. 飞航导弹2021(11): 12-18.
  SUN Z, HE G J, LI G J. Research on US army’s anti-UAV technology[J]. Aerodynamic Missile Journal2021(11): 12-18 (in Chinese).
[4] 王宇, 陈浩, 黄健. 有人机/无人机协同系统研究现状与展望[C]∥. 第十届中国指挥控制大会论文集(上册). 北京:兵器工业出版社,2022: 12-17.
  WANG Y, CHEN H, HUANG J. Research status and prospects of collaborative systems between drones and aerial vehicles [C]∥. Proceedings of the 10th China Command and Control Conference (Volume 1).Beijing:The Publishing House of Ordnance Industry, 2022: 12-17 (in Chinese).
[5] 严锐驰, 李帅, 王晨, 等. 基于自博弈强化学习的异构无人机集群协同对抗决策方法[J]. 中国科学: 信息科学202454(7): 1709-1729.
  YAN R C, LI S, WANG C, et al. Cooperative decision-making for heterogeneous UAV swarm confrontation based on self-play reinforcement learning[J]. Scientia Sinica (Informationis)202454(7): 1709-1729 (in Chinese).
[6] HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation200618(7): 1527-1554.
[7] SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. 2nd Ed. London: MIT Press, 2018.
[8] NGUYEN N D, NGUYEN T, NAHAVANDI S. System design perspective for human-level agents using deep reinforcement learning: A survey[J]. IEEE Access20175: 27091-27102.
[9] POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air combat at DARPA’s AlphaDogfight trials[J]. IEEE Transactions on Artificial Intelligence20224(6): 1371-1385.
[10] 孟光磊, 刘德见, 周铭哲, 等. 近距空战训练中的智能虚拟对手决策与导引方法[J]. 北京航空航天大学学报202248(6): 937-949.
  MENG G L, LIU D J, ZHOU M Z, et al. Intelligent virtual opponent decision making and guidance method in short-range air combat training[J]. Journal of Beijing University of Aeronautics and Astronautics202248(6): 937-949 (in Chinese).
[11] LIU P, MA Y F. A deep reinforcement learning based intelligent decision method for UCAV air combat[M]∥ Modeling, Design and Simulation of Systems. Singapore: Springer Singapore, 2017: 274-286.
[12] YANG Q M, ZHANG J D, SHI G Q, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access20198: 363-378.
[13] 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报202344(4): 126731.
  ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(4): 126731 (in Chinese).
[14] XIE L, DING D L, WEI Z L, et al. Moving time UCAV maneuver decision based on the dynamic relational weight algorithm and trajectory prediction[J]. Mathematical Problems in Engineering20212021(1): 6641567.
[15] 王宝来, 高显忠, 谢涛, 等. 基于强化学习与种群博弈的近距空战决策[J]. 航空学报202445(12): 329466.
  WANG B L, GAO X Z, XIE T, et al. Decision-making in close-range air combat based on reinforcement learning and population game[J]. Acta Aeronautica et Astronautica Sinica202445(12): 329466 (in Chinese).
[16] 李恒晖, 林前辉, 韩涛锋, 等. 基于能量机动的近距空战模型及应用[J]. 航空学报202546(7): 330863.
  LI H H, LIN Q H, HAN T F, et al. Close-range air combat model based on energy maneuverability and its applications[J]. Acta Aeronautica et Astronautica Sinica202546(7): 330863 (in Chinese).
[17] 杨书恒, 张栋, 熊威, 等. 基于可解释性强化学习的空战机动决策方法[J]. 航空学报202445(18): 329922.
  YANG S H, ZHANG D, XIONG W, et al. Decision-making method for air combat maneuver based on explainable reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202445(18): 329922 (in Chinese).
[18] 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报202142(8): 525799.
  SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525799 (in Chinese).
[19] ZAMBALDI V, RAPOSO D, SANTORO A, et al. Relational deep reinforcement learning[J]. arXiv preprint, arXiv:, 2018.
[20] YE W, LIU S, KURUTACH T, et al. Mastering atari games with limited data[J]. Advances in Neural Information Processing Systems202134: 25476-25488.
[21] GRAVES A. Long short-term memory[M]∥Supervised Sequence Labelling with Recurrent Neural Networks. Berlin: Springer, 2012: 37-45.
[22] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation19979(8): 1735-1780.
[23] VAN HOUDT G, MOSQUERA C, NáPOLES G. A review on the long short-term memory model[J]. Artificial Intelligence Review202053(8): 5929-5955.
文章导航

/