航空学报 > 2025, Vol. 46 Issue (15): 331354-331354   doi: 10.7527/S1000-6893.2024.31354

虚拟结构引领强化学习分布式无人机编队控制

王昱(), 谢志鹏, 田永健, 孟光磊   

  1. 沈阳航空航天大学 自动化学院,沈阳 110136
  • 收稿日期:2024-10-08 修回日期:2025-01-13 接受日期:2025-02-21 出版日期:2025-03-11 发布日期:2025-03-06
  • 通讯作者: 王昱 E-mail:wangyu@sau.edu.cn
  • 基金资助:
    国家自然科学基金(61906125);国家自然科学基金(62373261);辽宁省属本科高校基本科研业务费专项基金(LJ232410143020);辽宁省属本科高校基本科研业务费专项基金(LJ212410143047)

Distributed UAV formation control with virtual structure guided reinforcement learning

Yu WANG(), Zhipeng XIE, Yongjian TIAN, Guanglei MENG   

  1. School of Automation,Shenyang Aerospace University,Shenyang 110136,China
  • Received:2024-10-08 Revised:2025-01-13 Accepted:2025-02-21 Online:2025-03-11 Published:2025-03-06
  • Contact: Yu WANG E-mail:wangyu@sau.edu.cn
  • Supported by:
    National Natural Science Foundation of China(61906125);Basic Research Funds of Liaoning Provincial Universities(LJ232410143020)

摘要:

基于强化学习算法的单一决策模型在面对复杂无人机(UAV)编队控制任务时往往由于自主决策能力有限导致适应性不足,对此,提出了一种以虚拟结构法引领深度强化学习算法的分布式决策方法。首先,为降低强化学习算法在多样性任务环境中进行策略寻优的难度,对总体任务进行功能分解,分别针对静态障碍、随机障碍及通讯干扰等单一作业场景实施局部任务规划,构建多个决策子模型,并设计模型间自主调用流程;然后,以增加引导作用为出发点将虚拟结构法与软演员-评论家(SAC)强化学习算法结合,构建分布式决策框架,通过对各子模型的分散训练充分提高任务执行的成功率和灵活性;最后,采用集中执行的方式,由环境变化作为触发条件进行子模型的动态选择与无缝切换,实现无人机编队能够自主根据任务环境的变化灵活调整队形,达成任务目标的同时显著提升机群整体对环境的适应性以及生存能力,并通过多场景下的仿真实验验证方法的有效性。

关键词: 无人机编队控制, 复杂任务环境, 深度强化学习, 虚拟结构法, 分布式决策

Abstract:

In single decision-making models based on reinforcement learning algorithms, the adaptability is often insufficient when handling complex Unmanned Aerial Vehicle(UAV) formation tasks due to limited autonomous decision-making capabilities. To address this, this paper proposes a distributed decision-making method guided by the virtual structure approach integrated with a deep reinforcement learning algorithm. First, to reduce the difficulty of strategy optimization for reinforcement learning algorithms in diverse task environments, the overall task is functionally decomposed. Local task planning is then implemented for individual task scenarios, such as static obstacles, random obstacles, and communication interference. Multiple decision sub-models are constructed along with the design of the calling process between these models. Next, to enhance guidance, the virtual structure method is integrated with the Soft Actor-Critic(SAC) reinforcement learning algorithm to build a distributed decision-making framework. Through decentralized training of each sub-model, the success rate and flexibility of task execution are significantly improved. Finally, a centralized execution approach is adopted, where environmental changes serve as the triggering condition for the dynamic selection and seamless switching betweeen sub-models. This allows the UAV formation to autonomously adjust its formation according to changes in the task environment, achieving the mission objectives while significantly enhancing the overall adaptability and survivability of the swarm. The effectiveness of the method is validated through simulation experiments in multiple scenarios.

Key words: UAV formation control, complex task environment, deep reinforcement Learning, virtual structure method, distributed decision

中图分类号: