虚拟结构引领强化学习分布式无人机编队控制

doi:10.7527/S1000-6893.2024.31354

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 |

虚拟结构引领强化学习分布式无人机编队控制

王昱(), 谢志鹏, 田永健, 孟光磊

沈阳航空航天大学自动化学院，沈阳 110136

收稿日期:2024-10-08 修回日期:2025-01-13 接受日期:2025-02-21 出版日期:2025-03-11 发布日期:2025-03-06
通讯作者: 王昱 E-mail:wangyu@sau.edu.cn
基金资助:
国家自然科学基金(61906125);国家自然科学基金(62373261);辽宁省属本科高校基本科研业务费专项基金(LJ232410143020);辽宁省属本科高校基本科研业务费专项基金(LJ212410143047)

Distributed UAV formation control with virtual structure guided reinforcement learning

Yu WANG(), Zhipeng XIE, Yongjian TIAN, Guanglei MENG

School of Automation，Shenyang Aerospace University，Shenyang 110136，China

Received:2024-10-08 Revised:2025-01-13 Accepted:2025-02-21 Online:2025-03-11 Published:2025-03-06
Contact: Yu WANG E-mail:wangyu@sau.edu.cn
Supported by:
National Natural Science Foundation of China(61906125);Basic Research Funds of Liaoning Provincial Universities(LJ232410143020)

摘要/Abstract

摘要：

基于强化学习算法的单一决策模型在面对复杂无人机（UAV）编队控制任务时往往由于自主决策能力有限导致适应性不足，对此，提出了一种以虚拟结构法引领深度强化学习算法的分布式决策方法。首先，为降低强化学习算法在多样性任务环境中进行策略寻优的难度，对总体任务进行功能分解，分别针对静态障碍、随机障碍及通讯干扰等单一作业场景实施局部任务规划，构建多个决策子模型，并设计模型间自主调用流程；然后，以增加引导作用为出发点将虚拟结构法与软演员-评论家（SAC）强化学习算法结合，构建分布式决策框架，通过对各子模型的分散训练充分提高任务执行的成功率和灵活性；最后，采用集中执行的方式，由环境变化作为触发条件进行子模型的动态选择与无缝切换，实现无人机编队能够自主根据任务环境的变化灵活调整队形，达成任务目标的同时显著提升机群整体对环境的适应性以及生存能力，并通过多场景下的仿真实验验证方法的有效性。

关键词: 无人机编队控制, 复杂任务环境, 深度强化学习, 虚拟结构法, 分布式决策

Abstract:

In single decision-making models based on reinforcement learning algorithms， the adaptability is often insufficient when handling complex Unmanned Aerial Vehicle（UAV） formation tasks due to limited autonomous decision-making capabilities. To address this， this paper proposes a distributed decision-making method guided by the virtual structure approach integrated with a deep reinforcement learning algorithm. First， to reduce the difficulty of strategy optimization for reinforcement learning algorithms in diverse task environments， the overall task is functionally decomposed. Local task planning is then implemented for individual task scenarios， such as static obstacles， random obstacles， and communication interference. Multiple decision sub-models are constructed along with the design of the calling process between these models. Next， to enhance guidance， the virtual structure method is integrated with the Soft Actor-Critic（SAC） reinforcement learning algorithm to build a distributed decision-making framework. Through decentralized training of each sub-model， the success rate and flexibility of task execution are significantly improved. Finally， a centralized execution approach is adopted， where environmental changes serve as the triggering condition for the dynamic selection and seamless switching betweeen sub-models. This allows the UAV formation to autonomously adjust its formation according to changes in the task environment， achieving the mission objectives while significantly enhancing the overall adaptability and survivability of the swarm. The effectiveness of the method is validated through simulation experiments in multiple scenarios.

Key words: UAV formation control, complex task environment, deep reinforcement Learning, virtual structure method, distributed decision

中图分类号:

V279

王昱, 谢志鹏, 田永健, 孟光磊. 虚拟结构引领强化学习分布式无人机编队控制[J]. 航空学报, 2025, 46(15): 331354.

Yu WANG, Zhipeng XIE, Yongjian TIAN, Guanglei MENG. Distributed UAV formation control with virtual structure guided reinforcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(15): 331354.

图/表 20

图 1

图 2

图 3

图 4

图 5

图 6

表 1

主要环境参数

参数	数值
无人机1、2、3初始位置 $s 0 /$ km	［0，25］，［0，30］，［0，10］
虚拟结构点1、2、3初始位置 $s p o i n t /$ km	［10，25］，［10，30］，［10，20］
距离传感器探测范围 $d r a d a r /$ km	3
无人机速度范围 $[v m i n, v m a x]$ /（m·s^-1）	［100，300］
无人机角加速度范围 $[ω m i n, ω m a x]$ /（rad·s^-1）	［-2，2］
无人机工作时间 $T / s$	800（模型4） 1 200（模型1~3、5）

表 1

表 2

超参数

参数名称	数值
经验池大小/ $106$	65 536
训练批次大小/N	64
Actor网络训练率 $l a$	$3 × 10 - 4$
Critic网络训练率 $l c$	$3 × 10 - 4$
奖励折扣率 $γ$	0.96
软更新学习率 $τ$	0.05
最大训练次数 $E$	1 000
无人机步数 $T$	800（模型4） 1 200（模型1，2，3，5）

表 2

图 7

图 8

图 9

图 10

图 11

图 12

图 13

图 14

图 15

图 16

图 17

图 18

参考文献 24

[1]	王琳，张庆杰，陈宏伟. 基于领航者跟随者的群系统保性能编队控制［J］. 北京航空航天大学学报， 2024， 50（3）： 1037-1046.
	WANG L， ZHANG Q J， CHEN H W. Guaranteed-performance formation control of swarm systems based on leader-follower strategy［J］. Journal of Beijing University of Aeronautics and Astronautics， 2024， 50（3）： 1037-1046 （in Chinese）.
[2]	彭建帅，付兴建. 仿雁群行为的领航-跟随无人机编队控制［J］. 控制工程， 2023， 30（1）： 113-118.
	PENG J S， FU X J. Formation control of leader-follower UAV based on the behavior of geese swarm［J］. Control Engineering of China， 2023， 30（1）： 113-118 （in Chinese）.
[3]	吴立尧，韩维，张勇，等. 基于领航-跟随的有人/无人机编队队形保持控制［J］. 控制与决策， 2021， 36（10）： 2435-2441.
	WU L Y， HAN W， ZHANG Y， et al. Formation keeping control for manned/unmanned aerial vehicle formation based on leader-follower strategy‍［J］. Control and Decision， 2021， 36（10）： 2435-2441 （in Chinese）.
[4]	李正平，鲜斌. 基于虚拟结构法的分布式多无人机鲁棒编队控制［J］. 控制理论与应用， 2020， 37（11）： 2423-2431.
	LI Z P， XIAN B. Robust distributed formation control of multiple unmanned aerial vehicles based on virtual structure［J］. Control Theory & Applications， 2020， 37（11）： 2423-2431 （in Chinese）.
[5]	黄勇，李小将，杨业伟，等. 应用虚拟结构的卫星编队飞行自适应协同控制［J］. 中国空间科学技术， 2015， 35（3）： 75-83.
	HUANG Y， LI X J， YANG Y W， et al. Adaptive cooperative control for satellites formation flying using virtual structure［J］. Chinese Space Science and Technology， 2015， 35（3）： 75-83 （in Chinese）.
[6]	GUO J D， LIU Z G， SONG Y G， et al. Research on multi-UAV formation and semi-physical simulation with virtual structure［J］. IEEE Access， 2023， 11： 126027-126039.
[7]	LIU Y P， CHEN C， WANG Y， et al. A fast formation obstacle avoidance algorithm for clustered UAVs based on artificial potential field‍［J］. Aerospace Science and Technology， 2024， 147： 108974.
[8]	高运克，唐宏伟，高方坤，等. 无线紫外光通信下基于改进人工势场法的无人机编队控制研究［J］. 电气传动自动化， 2023， 45（6）： 6-12， 5.
	GAO Y K， TANG H W， GAO F K， et al. Research on UAV formation control based on improved artificial potential field method‍［J］. Electric Drive Automation， 2023， 45（6）： 6-12， 5 （in Chinese）.
[9]	陈博琛，唐文兵，黄鸿云，等. 基于改进人工势场的未知障碍物无人机编队避障［J］. 计算机科学， 2022， 49（S1）： 686-693.
	CHEN B C， TANG W B， HUANG H Y， et al. Pop-up obstacles avoidance for UAV formation based on improved artificial potential field‍［J］. Computer Science， 2022， 49（S1）： 686-693 （in Chinese）.
[10]	葛宇，廖煜雷，王博，等. 基于零空间行为融合的多智能体编队控制综述［J］. 哈尔滨工程大学学报， 2024， 45（8）： 1442-1450.
	GE Y， LIAO Y L， WANG B， et al. A review of multiagent formation control based on the null-space-based behavioral fusion algorithm［J］. Journal of Harbin Engineering University， 2024， 45（8）： 1442-1450 （in Chinese）.
[11]	TAN G G， ZHUANG J Y， ZOU J， et al. Coordination control for multiple unmanned surface vehicles using hybrid behavior-based method‍［J］. Ocean Engineering， 2021， 232： 109147.
[12]	HACENE N， MENDIL B. Behavior-based autonomous navigation and formation control of mobile robots in unknown cluttered dynamic environments with dynamic target tracking［J］. International Journal of Automation and Computing， 2021， 18（5）： 766-786.
[13]	GUO M， JAYAWARDHANA B， LEE J， et al. Maintaining and steering a formation in an unknown dynamic environment via a consistent distributed dynamic map［J］. International Journal of Robust and Nonlinear Control， 2024， 34（13）： 8785-8801.
[14]	PEI H Q， LAN Z Y. Multi-agent consistent formation control operation optimization for high-speed trains‍［J］. IEEE Access， 2023， 11： 139201-139210.
[15]	LIU W J， LYU S K， LIU T， et al. Multi-target optimization strategy for unmanned aerial vehicle formation in forest fire monitoring based on deep Q-network algorithm［J］. Drones， 2024， 8（5）： 201.
[16]	赵启，甄子洋，龚华军，等. 基于D3QN的无人机编队控制技术［J］. 北京航空航天大学学报， 2023， 49（8）： 2137-2146.
	ZHAO Q， ZHEN Z Y， GONG H J， et al. UAV formation control based on dueling double DQN［J］. Journal of Beijing University of Aeronautics and Astronautics， 2023， 49（8）： 2137-2146 （in Chinese）.
[17]	黄号，马文卉，李家诚，等. 未知环境下无人机编队智能避障控制方法［J］. 清华大学学报（自然科学版）， 2024， 64（2）： 358-369.
	HUANG H， MA W H， LI J C， et al. Intelligent obstacle avoidance control method for unmanned aerial vehicle formations in unknown environments［J］. Journal of Tsinghua University （Science and Technology）， 2024， 64（2）： 358-369 （in Chinese）.
[18]	XU D， GUO Y X， YU Z Y， et al. PPO-exp： Keeping fixed-wing UAV formation with deep reinforcement learning［J］. Drones， 2023， 7（1）： 28.
[19]	LI Y D， YUAN Y L， CHENG Y， et al. Predictive air combat decision model with segmented reward allocation［J］. Complex & Intelligent Systems， 2024， 10（6）： 7513-7530.
[20]	ZHOU Y X， SHU J S， HAO H， et al. UAV 3D online track planning based on improved SAC algorithm‍［J］. Journal of the Brazilian Society of Mechanical Sciences and Engineering， 2023， 46（1）： 12.
[21]	HAARNOJA T， ZHOU A， ABBEEL P， et al. Soft actor-critic： Off-policy maximum entropy deep reinforcement learning with a stochastic actor‍［DB/OL］. arXiv preprint： 1801.01290； 2018.
[22]	HAARNOJA T， ZHOU A， HARTIKAINEN K， et al. Soft actor-critic algorithms and applications［DB/OL］. arXiv preprint， 1812.05905； 2018.
[23]	LEVINE S， KUMAR A， TUCKER G， et al. Offline reinforcement learning： Tutorial， review， and perspectives on open problems‍［DB/OL］. arXiv preprint： 2005.01643； 2020.
[24]	ZHANG L J， PENG J B， YI W G， et al. A state-decomposition DDPG algorithm for UAV autonomous navigation in 3-D complex environments［J］. IEEE Internet of Things Journal， 2024， 11（6）： 10778-10790.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

虚拟结构引领强化学习分布式无人机编队控制

Distributed UAV formation control with virtual structure guided reinforcement learning

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 20

参考文献 24

相关文章 15

编辑推荐

Metrics

本文评价

[1]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[2]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[3]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[4]	陈伟, 李璐璐, 陈董, 张少辉, 李亚飞, 王可, 靳远远, 徐明亮. 差异化保障需求驱动的舰载机多机协同决策方法[J]. 航空学报, 2025, 46(13): 531274-531274.
[5]	陈旭东, 陈琦琦, 罗祎喆, 王佳宝, 徐明亮. 异构舰载机舰面保障作业动态并行调度[J]. 航空学报, 2025, 46(13): 531329-531329.
[6]	王政, 王华, 崔可可, 李超超, 刘俊楠, 徐明亮. 局部引导强化学习的舰载机自主调运方法[J]. 航空学报, 2025, 46(13): 531333-531333.
[7]	凌文辉, 牟春晖, 聂聆聪, 杜宪, 孙希明. 基于改进DDPG的宽速域几何可调燃烧室压力分布控制[J]. 航空学报, 2025, 46(12): 131092-131092.
[8]	余子杰, 郑征, 李清东, 郭林, 任素萍, 郭健. 基于深度强化学习的太阳能无人机航迹规划[J]. 航空学报, 2025, 46(12): 331420-331420.
[9]	高树一, 林德福, 郑多, 徐骋. 考虑拦截器探测能力限制的飞行器智能机动突防制导策略[J]. 航空学报, 2025, 46(10): 331304-331304.
[10]	张鸿林, 罗建军, 马卫华. 基于机器学习的航天器规避目标威胁博弈决策[J]. 航空学报, 2024, 45(8): 329136-329136.
[11]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[12]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.
[13]	高兵, 张哲婕, 邹启杰, 刘治国, 赵锡玲. 基于深度强化学习和信息论的多智能体通信方法[J]. 航空学报, 2024, 45(18): 329862-329862.
[14]	李佐龙, 朱纪洪, 匡敏驰, 张杰, 任洁. 基于混合动作的空战分层强化学习决策算法[J]. 航空学报, 2024, 45(17): 530053-530053.
[15]	武天才, 王宏伦, 任斌, 刘一恒, 吴星雨, 严国乘. 考虑规避与突防的高超声速飞行器智能容错制导控制一体化设计[J]. 航空学报, 2024, 45(15): 329607-329607.