考虑信道资源约束的多无人机航迹与通信策略协同规划

doi:10.7527/S1000-6893.2025.31837

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

考虑信道资源约束的多无人机航迹与通信策略协同规划

王辰¹, 魏才盛¹(), 殷泽阳¹, 靳锴², 李星辰³

^1.中南大学自动化学院，长沙 410083
^2.中国电子科技集团第五十四研究所，石家庄 050081
^3.军事科学院国防科技创新研究院，北京 100071

收稿日期:2025-01-22 修回日期:2025-03-26 接受日期:2025-04-17 出版日期:2025-09-25 发布日期:2025-04-25
通讯作者: 魏才盛 E-mail:caisheng_wei@csu.edu.cn
基金资助:
国家自然科学基金(62373379);湖南省自然科学基金(2024JJ6482);中南大学创新驱动项目(2023CXQD066)

Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints

Chen WANG¹, Caisheng WEI¹(), Zeyang YIN¹, Kai JIN², Xingchen LI³

^1.School of Automation，Central South University，Changsha 410083，China
^2.The 54th Research Institute of CETC，Shijiazhuang 050081，China
^3.National Innovation Institute of Defense Technology，Academy of Military Science，Beijing 100071，China

Received:2025-01-22 Revised:2025-03-26 Accepted:2025-04-17 Online:2025-09-25 Published:2025-04-25
Contact: Caisheng WEI E-mail:caisheng_wei@csu.edu.cn
Supported by:
National Natural Science Foundation of China(62373379);Hunan Provincial Natural Science Foundation(2024JJ6482);Central South University Innovation-Driven Research Program(2023CXQD066)

摘要/Abstract

摘要：

针对多无人机协同侦察任务中飞行航迹与通信策略的优化问题，考虑飞行距离、通信能耗、信道容量等多元代价和基站信道资源约束、无人机性能约束、避碰约束等多重约束，提出了一种基于深度强化学习的协同规划方法。首先，面向随机未知侦察环境建立了多无人机航迹与通信策略协同规划模型。其次，提出了一种基于多智能体近端策略优化算法的端到端深度强化学习框架，以飞行距离、通信能耗、信道容量为多元优化目标，对无人机轨迹、通信连接策略、通信发射功率等耦合变量进行联合优化求解。为了降低多目标任务的学习和求解难度，基于人工势场法设计了一种包含基站虚拟引力的航迹规划子模型，通过强化学习自动参数寻优的方式，降低决策空间大小、加快模型收敛速度。最后，通过仿真实验验证了所提方法在优化任务总成本指标上的优势。

关键词: 无人机, 航迹规划, 基站信道资源约束, 深度强化学习, 协同优化

Abstract:

To address the optimization problem of flight trajectories and communication strategies in multi-UAV collaborative reconnaissance missions， this study proposes a collaborative planning approach based on deep reinforcement learning， considering diverse costs such as flight distance， communication energy consumption， and channel capacity， along with multiple constraints including base station channel resource constraints， UAV performance constraints， and collision avoidance constraints. Firstly， a collaborative planning model for multi-UAV trajectories and communication strategies is established for randomly unknown reconnaissance environments. Secondly， an end-to-end deep reinforcement learning framework based on the multi-agent proximal policy optimization algorithm is introduced to jointly optimize coupled variables such as UAV trajectories， communication connection strategies， and communication transmit power， with flight distance， communication energy consumption， and channel capacity serving as multiple optimization objectives. To reduce the complexity of learning and solving multi-objective tasks， a trajectory planning sub-model incorporating virtual gravity from base stations is designed， which decreases the decision space. A trajectory planning sub-model that incorporates the virtual gravitational force of base stations is designed based on the artificial potential field method. Through the approach of automatically optimizing parameters via reinforcement learning， the size of the decision space is reduced， and the convergence speed of the model is accelerated. Finally， simulation experiments demonstrate the advantages of the proposed method in optimizing the overall mission cost index.

Key words: UVA, trajectory planning, base station channel resource constraint, deep reinforcement learning, collaborative optimization

中图分类号:

V279

王辰, 魏才盛, 殷泽阳, 靳锴, 李星辰. 考虑信道资源约束的多无人机航迹与通信策略协同规划[J]. 航空学报, 2025, 46(18): 331837.

Chen WANG, Caisheng WEI, Zeyang YIN, Kai JIN, Xingchen LI. Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(18): 331837.

图/表 15

图 1

图 2

图 3

表 1

仿真参数设置

参数	参数含义	参数值设置
$f$ /GHz	载波频率	2
$N 0$ /（dBm·Hz^-1）	噪声功率谱密度	-174
$W$ /MHz	带宽	1
$p n$ /dBm	通信发射功率	10~20
$λ$	基站最大服务数	3
$a c t o r_l r$	策略网络的学习率	5×10^-4
$c r i t i c_l r$	评价网络的学习率	5×10^-4
$e p i s o d e s$	总训练回合数	10×10⁴
$e p i s o d e_l e n g t h$	每回合的最大步数	100
$γ$	折扣因子	0.99
$ε$	Clip系数	0.2
$v m i n$ /（m·s^-1）	无人机最小线速度	0.1
$v m a x$ /（m·s^-1）	无人机最大线速度	12.5
$Δ β$ /rad	俯仰角最大变化值	1.047 2
$Δ ψ$ /rad	偏航角最大变化值	1.047 2
$k d$	线速度比例系数	0.1
$k g o a l$	目标引力系数	1
$k o b s$	障碍物斥力系数	10
$d o b s$ /m	安全距离阈值	500
$d α$ /m	基站引力距离阈值	300
$w 1$	飞行距离权重因子	0.1
$w 2$	通信能耗权重因子	8
$w 3$	信道容量权重因子	1

表 1

图 4

图 5

图 6

图 7

图 8

图 9

图 10

图 11

图 12

图 13

图 14

参考文献 32

[1]	PENG G Z， XIA Y X， ZHANG X J， et al. UAV-aided networks for emergency communications in areas with unevenly distributed users［C］∥2018 IEEE International Conference on Communication Systems （ICCS）. Piscataway： IEEE Press， 2018： 25-29.
[2]	SAADI A AIT， SOUKANE A， MERAIHI Y， et al. UAV path planning using optimization approaches： A survey［J］. Archives of Computational Methods in Engineering， 2022， 29（6）： 4233-4284.
[3]	庞磊，曹志强，喻俊志. 基于A^*和TEB融合的行人感知无碰跟随方法［J］. 航空学报， 2021， 42（4）： 524909.
	PANG L， CAO Z Q， YU J Z. A pedestrian-aware collision-free following approach for mobile robots based on A^* and TEB［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（4）： 524909 （in Chinese）.
[4]	REN Z Q， RATHINAM S， LIKHACHEV M， et al. Multi-objective path-based D^* lite［J］. IEEE Robotics and Automation Letters， 2022， 7（2）： 3318-3325.
[5]	HUANG H X， SHANG Y X， LIU X F， et al. An improved Bi-RRT^*-based path planning algorithm with adaptive search strategy assignment mechanism for ultra-low-altitude penetration of fixed-wing aircraft［J］. Aerospace Science and Technology， 2024， 152： 109363.
[6]	符歆国，关成启，杨婷，等. 基于改进RRT^*的RLV在线再入轨迹规划算法［J］. 飞控与探测， 2025， 8（1）： 57-66.
	FU X G， GUAN C Q， YANG T， et al. Online re-entry trajectory planning algorithm for reusable launch vehicle based on improved RRT^* ［J］. Flight Control & Detection， 2025， 8（1）： 57-66 （in Chinese）.
[7]	YANG H X， XU X M， HONG J C. Automatic parking path planning of tracked vehicle based on improved A^* and DWA algorithms［J］. IEEE Transactions on Transportation Electrification， 2023， 9（1）： 283-292.
[8]	SHENG H L， ZHANG J， YAN Z Y， et al. New multi-UAV formation keeping method based on improved artificial potential field［J］. Chinese Journal of Aeronautics， 2023， 36（11）： 249-270.
[9]	王羿，叶辉，杨晓飞. 基于无源性与势场法的四旋翼避障与位置控制［J］. 航空学报， 2023， 44（S1）： 727492.
	WANG Y， YE H， YANG X F. A position control and obstacle avoidance method for quadrotor via approach based on passivity and artificial potential filed［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（S1）： 727492.
[10]	SHIN Y， KIM E. Hybrid path planning using positioning risk and artificial potential fields［J］. Aerospace Science and Technology， 2021， 112： 106640.
[11]	于全友，徐止政，段纳，等. 基于改进ACO的带续航约束无人机全覆盖作业路径规划［J］. 航空学报， 2023， 44（12）： 327856.
	YU Q Y， XU Z Z， DUAN N， et al. Coverage operation path planning of UAV with endurance constraints based on improved ACO［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（12）： 327856 （in Chinese）.
[12]	LI Y P， ZHANG L X， CAI B， et al. Unified path planning for composite UAVs via Fermat point-based grouping particle swarm optimization［J］. Aerospace Science and Technology， 2024， 148： 109088.
[13]	JIANG W， LYU Y X， LI Y F， et al. UAV path planning and collision avoidance in 3D environments based on POMPD and improved grey wolf optimizer［J］. Aerospace Science and Technology， 2022， 121： 107314.
[14]	周彬，郭艳，李宁，等. 基于导向强化Q学习的无人机路径规划［J］. 航空学报， 2021， 42（9）： 325109.
	ZHOU B， GUO Y， LI N， et al. Path planning of UAV using guided enhancement Q-learning algorithm［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（9）： 325109 （in Chinese）.
[15]	SCHLICHTING M R， NOTTER S， FICHTER W. Long short-term memory for spatial encoding in multi-agent path planning［J］. Journal of Guidance， Control， and Dynamics， 2022， 45（5）： 952-961.
[16]	魏瑶，刘小毛，张晗，等. 基于DDPG的单目无人机避障算法［J］. 飞控与探测， 2023， 6（3）： 52-62.
	WEI Y， LIU X M， ZHANG H， et al. Obstacle avoidance algorithm for monocular UAV based on DDPG［J］. Flight Control & Detection， 2023， 6（3）： 52-62 （in Chinese）.
[17]	谭富威，何永宁，孙晓晖，等. 基于深度强化学习的飞行器过载和姿态智能控制研究［J］. 飞控与探测， 2025， 8（1）： 25-31.
	TAN F W， HE Y N， SUN X H， et al. Intelligent control of aircraft overload and attitude based on deep reinforcement learning［J］. Flight Control & Detection， 2025， 8（1）： 25-31 （in Chinese）.
[18]	ZHANG S W， ZENG Y， ZHANG R. Cellular-enabled UAV communication： A connectivity-constrained trajectory optimization perspective［J］. IEEE Transactions on Communications， 2019， 67（3）： 2580-2604.
[19]	FONTANESI G， ZHU A D， ARVANEH M， et al. A transfer learning approach for UAV path design with connectivity outage constraint［J］. IEEE Internet of Things Journal， 2022， 10（6）： 4998-5012.
[20]	WANG X Y， GURSOY M C. Learning-based UAV trajectory optimization with collision avoidance and connectivity constraints［J］. IEEE Transactions on Wireless Communications， 2021， 21（6）： 4350-4363.
[21]	NGUYEN K K， DUONG T Q， DO-DUY T， et al. 3D UAV trajectory and data collection optimisation via deep reinforcement learning［J］. IEEE Transactions on Communications， 2022， 70（4）： 2358-2371.
[22]	WANG X J， YI M J， LIU J， et al. Cooperative data collection with multiple UAVs for information freshness in the internet of things［J］. IEEE Transactions on Communications， 2023， 71（5）： 2740-2755.
[23]	张薇，何若俊. 面向物联网数据收集的无人机自主路径规划［J］. 航空学报， 2024， 45（8）： 329054.
	ZHANG W， HE R J. Autonomous trajectory design for IoT data collection by UAV［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（8）： 329054 （in Chinese）.
[24]	WANG L， WANG K Z， PAN C H， et al. Deep reinforcement learning based dynamic trajectory control for UAV-assisted mobile edge computing［J］. IEEE Transactions on Mobile Computing， 2022， 21（10）： 3536-3550.
[25]	ZHANG Y， MOU Z Y， GAO F F， et al. UAV-enabled secure communications by multi-agent deep reinforcement learning［J］. IEEE Transactions on Vehicular Technology， 2020， 69（10）： 11599-11611.
[26]	雷耀麟，丁文锐，罗祎喆，等. 无人机数据采集任务中的航迹与资源优化［J/OL］. 北京航空航天大学学报，（2023-10-19）［2025-01-15］. .
	LEI Y L， DING W R， LUO Y Z， et al. Trajectory planning and resource allocation methods in UAV data collection missions［J/OL］. Journal of Beijing University of Aeronautics and Astronautics，（2023-10-19）［2025-01-15］. （in Chinese）.
[27]	胥彪，赵琛钰，李爽，等. 基于深度强化学习的高超声速飞行器动态面控制方法［J］. 飞控与探测， 2023， 6（1）： 15-23.
	XU B， ZHAO C Y， LI S， et al. Dynamic surface control method for hypersonic vehicle based on deep reinforcement learning［J］. Flight Control & Detection， 2023， 6（1）： 15-23 （in Chinese）.
[28]	BANACIA A S， BRIOSO J G， SAWADA H， et al. Experimental verification of ITU-R P.1411 as path loss prediction model for IEEE 802.11af［C］∥21st International Symposium on Wireless Personal Multimedia Communications （WPMC）， 2018.
[29]	王雪松，王荣荣，程玉虎. 基于表征学习的离线强化学习方法研究综述［J］. 自动化学报， 2024， 50（6）： 1104-1128.
	WANG X S， WANG R R， CHENG Y H. A review of offline reinforcement learning based on representation learning［J］. Acta Automatica Sinica， 2024， 50（6）： 1104-1128 （in Chinese）.
[30]	YU C， VELU A， VINITSKY E， et al. The surprising effectiveness of PPO in cooperative， multi-agent games［C］∥36th Conference on Neural Information Processing Systems （NeurIPS 2022） Track on Datasets and Benchmarks，2022.
[31]	WU W， WANG Q， WU X L， et al. Joint offloading and resource allocation for scalable vehicular edge computing［C］∥2020 IEEE 92nd Vehicular Technology Conference （VTC2020-Fall）. Piscataway： IEEE Press， 2020.
[32]	SCHULMAN J， MORTIZ P， LEVINE S， et al. High-dimensional continuous control using generalized advantage estimation［DB/OL］. arXiv preprint：1506.02438，2021.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

考虑信道资源约束的多无人机航迹与通信策略协同规划

Collaborative planning of multi-UAV trajectories and communication strategies considering channel resource constraints

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 32

相关文章 15

编辑推荐

Metrics

本文评价

[1]	虞翔宇, 李文, 严杰, 梁世哲. 无人机液氢燃料电池热管理系统仿真研究[J]. 航空学报, 2025, 46(9): 630964-630964.
[2]	杨芃芊, 陈禹彤, 刘俊辉, 杨杰豪, 单家元, 孙士珺. 串列翼货运无人机大攻角气动与操稳特性[J]. 航空学报, 2025, 46(9): 131056-131056.
[3]	李荣祖, 刘莉, 杨盾. 基于多源域融合代理模型的氢能无人机优化设计[J]. 航空学报, 2025, 46(9): 630979-630979.
[4]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[5]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[6]	向锦武, 马凯, 阚梓, 李道春, 郑可欣, 陈汉轩. 氢能源无人机关键技术研究进展[J]. 航空学报, 2025, 46(5): 531603-531603.
[7]	丁奇帅, 雷帮军, 吴正平. 基于孪生网络的轻量型无人机单目标跟踪算法[J]. 航空学报, 2025, 46(4): 330925-330925.
[8]	吴付杰, 王博文, 齐静雅, 曹铭智, 桑英俊, 李晟, 张玉珍, 陈钱, 左超. 机载多孔径全景图像合成技术研究进展[J]. 航空学报, 2025, 46(3): 630505-630505.
[9]	马诺, 卫社春, 孟军辉, 刘清洋, 雷宇声. 考虑减速伞作用的无人机内埋舱体分离流场特性与动力学[J]. 航空学报, 2025, 46(3): 130755-130755.
[10]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[11]	吴一全, 童康. 基于深度学习的无人机航拍图像小目标检测研究进展[J]. 航空学报, 2025, 46(3): 30848-030848.
[12]	郑忆, 程向红, 唐兴邦, 曹毅. 基于改进ReDet的航拍绝缘子及其缺陷定向检测算法[J]. 航空学报, 2025, 46(18): 331825-331825.
[13]	陈秋实, 高精隆, 王萌, 边文昆, 韩昊峻. 无人机卫星导航系统抗干扰技术综述[J]. 航空学报, 2025, 46(17): 331797-331797.
[14]	姜筱巍, 吴一全. 无人机航拍图像拼接方法研究进展[J]. 航空学报, 2025, 46(17): 331799-331799.
[15]	赵江, 皮明豪, 田栢苓, 池沛, 王英勋. 面向多目标跟踪的集群无人机自组织共识决策方法[J]. 航空学报, 2025, 46(16): 331635-331635.