导航

Acta Aeronautica et Astronautica Sinica ›› 2024, Vol. 45 ›› Issue (17): 529460-529460.doi: 10.7527/S1000-6893.2023.29460

• Articles •    

Intelligent maneuvering decision-making in two-UCAV cooperative air combat based on improved MADDPG with hybrid hyper network

Wentao LI1, Feng FANG1(), Zhenya WANG2, Yichao ZHU1, Dongliang PENG1   

  1. 1.School of Automation,Hangzhou Dianzi University,Hangzhou  310018,China
    2.China Academy of Aerospace Science and Innovation,Beijing  100076,China
  • Received:2023-08-18 Revised:2023-09-26 Accepted:2023-10-24 Online:2023-11-02 Published:2023-11-01
  • Contact: Feng FANG E-mail:fangf@hdu.edu.cn
  • Supported by:
    the Fundamental Research Funds for the Provincial Universities of Zhejiang(GK209907299001-021)

Abstract:

In the case of two Unmanned Combat Aerial Vehicles (UCAVs) cooperative air combat with local observation, there are the problems such as hard-to-design collaborative rewards, low collaboration efficiency and poor decision-making effect. To solve these problems, an intelligent maneuver decision-making method is proposed based on the improved the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) with hybrid hyper network. A Centralized Training with Decentralized Execution (CTDE) architecture is adopted to meet the training requirements of global coordinated maneuvering decision in the situation of single agent with local observation. A reward function is designed for the UCAV agent by considering both the local reward of fast guidance for obtaining attack advantage and the global reward for winning air combat. Then, a hybrid hyper network is introduced to mix the estimated Q values of each agent monotonically and nonlinearly to obtain the global policy value function. By using the global policy value function, the decentralized Actor network update parameters to solve the problem of credit assignment in multi-agent deep reinforcement learning. Simulation results show that compared with the traditional MADDPG method, the proposed method can produce the optimal global cooperative maneuver commands for achieving better coordination performance, and can obtain a higher winning rate with the same agent opponent.

Key words: unmanned combat aerial vehicle, air combat maneuvering decision, Multi-Agent Deep Deterministic Policy Gradient (MADDPG), hybrid hyper network, centralized training with decentralized execution

CLC Number: