电子电气工程与控制

基于强化学习与种群博弈的近距空战决策

  • 王宝来 ,
  • 高显忠 ,
  • 谢涛 ,
  • 侯中喜
展开
  • 1.国防科技大学 计算机学院,长沙 410073
    2.国防科技大学 空天科学学院,长沙 410073
.E-mail: hamishxie@vip.sina.com

收稿日期: 2023-08-15

  修回日期: 2023-09-06

  录用日期: 2023-10-31

  网络出版日期: 2023-11-01

基金资助

国家自然科学基金(61903369);湖南省自然科学基金(2018JJ3587)

Decision⁃making in close⁃range air combat based on reinforcement learning and population game

  • Baolai WANG ,
  • Xianzhong GAO ,
  • Tao XIE ,
  • Zhongxi HOU
Expand
  • 1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China
    2.College of Aerospace Science and Engineering,National University of Defense Technology,Changsha 410073,China

Received date: 2023-08-15

  Revised date: 2023-09-06

  Accepted date: 2023-10-31

  Online published: 2023-11-01

Supported by

National Natural Science Foundation of China(61903369);Natural Science Foundation of Hunan Province(2018JJ3587)

摘要

随着人工智能与无人机(UAV)技术的发展,近距空战智能决策得到了世界各国的广泛关注。针对传统强化学习在解决近距空战智能决策问题时存在过拟合与策略循环等问题,提出了一种基于种群博弈的空战智能决策模型训练范式。通过构建由多个无人机智能体组成的种群,并为每个智能体赋予不同奖励权重系数,实现了无人机智能体多样化的风险偏好。种群中不同风险偏好的智能体模型相互进行对抗训练,能够有效避免过拟合和策略循环问题。在训练过程中,每个无人机智能体根据与不同对手策略的对抗结果自适应地优化奖励权重系数。在数值仿真实验中,种群博弈训练中的智能体5与智能体3分别以88%和85%的胜率击败了专家系统对抗训练和自博弈训练得到的智能决策模型,算法性能得到有效验证。此外,通过进一步实验表明了种群博弈训练范式中权重系数动态调整的必要性,并在异构机型上验证了所提训练范式的通用性。

本文引用格式

王宝来 , 高显忠 , 谢涛 , 侯中喜 . 基于强化学习与种群博弈的近距空战决策[J]. 航空学报, 2024 , 45(12) : 329446 -329446 . DOI: 10.7527/S1000-6893.2023.29446

Abstract

With the development of artificial intelligence and Unmanned Aerial Vehicle (UAV) technologies, intelligent decision-making in close-range air combat has attracted extensive attention from all over the world. To solve the problems of overfitting and strategy cycles in using traditional reinforcement learning for intelligent decision-making in close-range air combat, a training paradigm of air combat decision model is proposed based on population game. By constructing a population composed of multiple UAV agents and assigning different reward weight coefficients to each agent, the diversified risk preference of UAV agents is realized. The problem of overfitting and strategy cycle can be avoided effectively by training agents of different risk preferences to fight against each other. During the training process, each UAV agent in the population adaptively optimizes the reward weight coefficient according to the results of the confrontation with different opponent strategies. In the numerical simulation experiment, Agent 5 and Agent 3 in population game training beat the intelligent decision model obtained by expert system adversarial training and self-play training with 88% and 85% success rate, respectively, which verifies the effectiveness of the algorithm. In addition, further experiments demonstrate the necessity of dynamic adjustment of weight coefficients in the training paradigm of population game, and verify the generality of the proposed training paradigm on heterogeneous models.

参考文献

1 孙聪. 从空战制胜机理演变看未来战斗机发展趋势[J]. 航空学报202142(8): 525826.
  SUN C. Development trend of future fighter: A review of evolution of winning mechanism in air combat[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525826 (in Chinese).
2 孙智孝, 杨晟琦, 朴海音, 等. 未来智能空战发展综述[J]. 航空学报202142(8): 525799.
  SUN Z X, YANG S Q, PIAO H Y, et al. A survey of air combat artificial intelligence[J]. Acta Aeronautica et Astronautica Sinica202142(8): 525799 (in Chinese).
3 樊会涛, 闫俊. 空战体系的演变及发展趋势[J]. 航空学报202243(10): 527397.
  FAN H T, YAN J. Evolution and development trend of air combat system[J]. Acta Aeronautica et Astronautica Sinica202243(10): 527397 (in Chinese).
4 BURGIN G H, FOGEL L J, PHELPS J P. An adaptive maneuvering logic computer program for the simulation of one-on-one air-to-air combat. Volume 1: General description: NASA-CR-2582[R]. Washington, D.C.: NASA, 1975.
5 傅莉, 谢福怀, 孟光磊, 等. 基于滚动时域的无人机空战决策专家系统[J]. 北京航空航天大学学报201541(11): 1994-1999.
  FU L, XIE F H, MENG G L, et al. An UAV air-combat decision expert system based on receding horizon control[J]. Journal of Beijing University of Aeronautics and Astronautics201541(11): 1994-1999 (in Chinese).
6 SMITH R. Classifier systems in combat: Two-sided learning of maneuvers for advanced fighter aircraft[J]. Computer Methods in Applied Mechanics and Engineering2000186(2-4): 421-437.
7 周文卿, 朱纪洪, 匡敏驰. 一种基于群体智能的无人空战系统[J]. 中国科学: 信息科学202050(3): 363-374.
  ZHOU W Q, ZHU J H, KUANG M C. An unmanned air combat system based on swarm intelligence[J]. Scientia Sinica (Informationis)202050(3): 363-374 (in Chinese).
8 ERNEST N, CARROLL D. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions[J/OL]. Journal of Defense Management (2016-06-23) [2023-07-18]. .
9 AUSTIN F, CARBONE G, HINZ H, et al. Game theory for automated maneuvering during air-to-air combat[J]. Journal of Guidance Control Dynamics199013(6): 1143-1149.
10 WEINTRAUB I E, PACHTER M, GARCIA E. An introduction to pursuit-evasion differential games[C]∥ 2020 American Control Conference (ACC). Piscataway: IEEE Press, 2020: 1049-1066.
11 YANG J C, ZHANG J P, WANG H H. Urban traffic control in software defined Internet of Things via a multi-agent deep reinforcement learning approach[J]. IEEE Transactions on Intelligent Transportation Systems202122(6): 3742-3754.
12 FU M S, HUANG L W, RAO A, et al. A deep reinforcement learning recommender system with multiple policies for recommendations[J]. IEEE Transactions on Industrial Informatics202319(2): 2049-2061.
13 POPE A P, IDE J S, MI?OVI? D, et al. Hierarchical reinforcement learning for air-to-air combat[C]∥ 2021 International Conference on Unmanned Aircraft Systems (ICUAS). Piscataway: IEEE Press, 2021: 275-284.
14 李波, 白双霞, 孟波波, 等. 基于SAC算法的无人机自主空战决策算法[J]. 指挥控制与仿真202244(5): 24-30.
  LI B, BAI S X, MENG B B, et al. Autonomous Air Combat Decision-making Algorithm of UAVs Based on SAC algorithm[J]. Command Control & Simulation202244(5): 24-30 (in Chinese).
15 邱妍, 赵宝奇, 邹杰, 等. 基于PPO算法的无人机近距空战自主引导方法[J]. 电光与控制202330(1): 8-14.
  QIU Y, ZHAO B Q, ZOU J, et al. An autonomous guidance method of UAV in close air combat based on PPO algorithm[J]. Electronics Optics & Control202330(1): 8-14 (in Chinese).
16 丁维, 王渊, 丁达理, 等. 基于LSTM-PPO算法的无人作战飞机近距空战机动决策[J]. 空军工程大学学报(自然科学版)202223(3): 19-25.
  DING W, WANG Y, DING D L, et al. Maneuvering decision of UCAV in close air combat based on LSTM-PPO algorithm[J]. Journal of Air Force Engineering University (Natural Science Edition)202223(3): 19-25 (in Chinese).
17 周攀, 黄江涛, 章胜, 等. 基于深度强化学习的智能空战决策与仿真[J]. 航空学报202344(4): 126731.
  ZHOU P, HUANG J T, ZHANG S, et al. Intelligent air combat decision making and simulation based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(4): 126731 (in Chinese).
18 章胜, 周攀, 何扬, 等. 基于深度强化学习的空战机动决策试验[J]. 航空学报202344(10): 128094.
  ZHANG S, ZHOU P, HE Y, et al. Air combat maneuver decision-making test based on deep reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica202344(10): 128094 (in Chinese).
19 付宇鹏, 邓向阳, 朱子强, 等. 基于价值滤波的空战机动决策优化方法[J]. 航空学报202344(22): 628871.
  FU Y P, DENG X Y, ZHU Z Q, et al. Value-filter based air-combat maneuvering optimization[J]. Acta Aeronautica et Astronautica Sinica202344(22): 628871 (in Chinese).
20 PIAO H Y, SUN Z X, MENG G L, et al. Beyond-visual-range air combat tactics auto-generation by reinforcement learning[C]∥ 2020 International Joint Conference on Neural Networks (IJCNN). Piscataway: IEEE Press, 2020: 1-8.
21 SUN Z X, PIAO H Y, YANG Z, et al. Multi-agent hierarchical policy gradient for Air Combat Tactics emergence via self-play[J]. Engineering Applications of Artificial Intelligence202198: 104112.
22 CZARNECKI W M, GIDEL G, TRACEY B, et al. Real world games look like spinning tops[C]∥ Proceedings of the 34th International Conference on Neural Information Processing Systems. New York: ACM, 2020: 17443–17454.
23 LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[DB/OL]. arXiv preprint: 1509.02971, 2015.
24 FUJIMOTO S, VAN HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[DB/OL]. arXiv preprint1802.09477, 2018.
25 HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications[DB/OL]. arXiv preprint1812.05905, 2018.
26 VINYALS O, BABUSCHKIN I, CZARNECKI W M, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning[J]. Nature2019575: 350-354.
27 JADERBERG M, DALIBARD V, OSINDERO S, et al. Population based training of neural networks[DB/OL]. arXiv preprint: 1711.09846, 2017.
28 BANSAL T, PACHOCKI J, SIDOR S, et al. Emergent complexity via multi-agent competition[DB/OL]. arXiv preprint: 1710.03748, 2017.
文章导航

/