导航

Acta Aeronautica et Astronautica Sinica ›› 2024, Vol. 45 ›› Issue (12): 329446-329446.doi: 10.7527/S1000-6893.2023.29446

• Electronics and Electrical Engineering and Control • Previous Articles    

Decision⁃making in close⁃range air combat based on reinforcement learning and population game

Baolai WANG1, Xianzhong GAO2, Tao XIE1(), Zhongxi HOU2   

  1. 1.College of Computer Science and Technology,National University of Defense Technology,Changsha 410073,China
    2.College of Aerospace Science and Engineering,National University of Defense Technology,Changsha 410073,China
  • Received:2023-08-15 Revised:2023-09-06 Accepted:2023-10-31 Online:2023-11-02 Published:2023-11-01
  • Contact: Tao XIE E-mail:hamishxie@vip.sina.com
  • Supported by:
    National Natural Science Foundation of China(61903369);Natural Science Foundation of Hunan Province(2018JJ3587)

Abstract:

With the development of artificial intelligence and Unmanned Aerial Vehicle (UAV) technologies, intelligent decision-making in close-range air combat has attracted extensive attention from all over the world. To solve the problems of overfitting and strategy cycles in using traditional reinforcement learning for intelligent decision-making in close-range air combat, a training paradigm of air combat decision model is proposed based on population game. By constructing a population composed of multiple UAV agents and assigning different reward weight coefficients to each agent, the diversified risk preference of UAV agents is realized. The problem of overfitting and strategy cycle can be avoided effectively by training agents of different risk preferences to fight against each other. During the training process, each UAV agent in the population adaptively optimizes the reward weight coefficient according to the results of the confrontation with different opponent strategies. In the numerical simulation experiment, Agent 5 and Agent 3 in population game training beat the intelligent decision model obtained by expert system adversarial training and self-play training with 88% and 85% success rate, respectively, which verifies the effectiveness of the algorithm. In addition, further experiments demonstrate the necessity of dynamic adjustment of weight coefficients in the training paradigm of population game, and verify the generality of the proposed training paradigm on heterogeneous models.

Key words: close-range air combat, intelligent decision-making, reinforcement learning, population game, SAC algorithm

CLC Number: