航空学报 > 2021, Vol. 42 Issue (4): 524009-524009   doi: 10.7527/S1000-6893.2020.24009

基于深度强化学习的固定翼无人机编队协调控制方法

相晓嘉, 闫超, 王菖, 尹栋   

  1. 国防科技大学 智能科学学院, 长沙 410073
  • 收稿日期:2020-03-24 修回日期:2020-05-18 发布日期:2020-07-06
  • 通讯作者: 闫超 E-mail:yanchao17@nudt.edu.cn
  • 基金资助:
    国家自然科学基金(61906203);西北工业大学无人机特种技术重点实验室基金(614230110080817)

Coordination control method for fixed-wing UAV formation through deep reinforcement learning

XIANG Xiaojia, YAN Chao, WANG Chang, YIN Dong   

  1. College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China
  • Received:2020-03-24 Revised:2020-05-18 Published:2020-07-06
  • Supported by:
    National Natural Science Foundation of China (61906203); The Foundation of National Key Laboratory of Science and Technology on UAV, Northwestern Polytechnical University (614230110080817)

摘要: 由于运动学的复杂性和环境的动态性,控制一组无人机遂行任务目前仍面临较大挑战。首先,以固定翼无人机为研究对象,考虑复杂动态环境的随机性和不确定性,提出了基于无模型深度强化学习的无人机编队协调控制方法。然后,为平衡探索和利用,将ε-greedy策略与模仿策略相结合,提出了ε-imitation动作选择策略;结合双重Q学习和竞争架构对DQN(Deep Q-Network)算法进行改进,提出了ID3QN(Imitative Dueling Double Deep Q-Network)算法以提高算法的学习效率。最后,构建高保真半实物仿真系统进行硬件在环仿真飞行实验,验证了所提算法的适应性和实用性。

关键词: 固定翼无人机, 无人机编队, 协调控制, 深度强化学习, 神经网络

Abstract: Due to the complexity of kinematics and environmental dynamics, controlling a squad of fixed-wing Unmanned Aerial Vehicles (UAVs) remains a challenging problem. Considering the uncertainty of complex and dynamic environments, this paper solves the coordination control problem of UAV formation based on the model-free deep reinforcement learning algorithm. A new action selection strategy, ε-imitation strategy, is proposed by combining the ε-greedy strategy and the imitation strategy to balance the exploration and the exploitation. Based on this strategy, the double Q-learning technique, and the dueling architecture, the ID3QN (Imitative Dueling Double Deep Q-Network) algorithm is developed to boost learning efficiency. The results of the Hardware-In-Loop experiments conducted in a high-fidelity semi-physical simulation system demonstrate the adaptability and practicality of the proposed ID3QN coordinated control algorithm.

Key words: fixed-wing UAVs, UAV formation, coordination control, deep reinforcement learning, neural networks

中图分类号: