拒止环境下基于深度强化学习的多无人机协同定位

doi:10.7527/S1000-6893.2024.31024

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 |

拒止环境下基于深度强化学习的多无人机协同定位

万开方(), 吴志林, 武韫晖, 强皓植, 吴艺博, 李波

西北工业大学电子信息学院，西安 710072

收稿日期:2024-08-01 修回日期:2024-09-27 接受日期:2024-11-21 出版日期:2024-12-11 发布日期:2024-12-05
通讯作者: 万开方 E-mail:wankaifang@nwpu.edu.cn
基金资助:
国家自然科学基金(62003267);陕西省重点研发计划(2023-GHZD-33);中央高校基本科研业务费专项资金(G2022KYO602);电磁空间作战与应用重点实验室资助(2022ZX0090);空基信息感知与融合全国重点实验室开放课题资助(202471)

Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment

Kaifang WAN(), Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI

School of Electronics and Information，Northwestern Polytechnical University，Xi’an 710072，China

Received:2024-08-01 Revised:2024-09-27 Accepted:2024-11-21 Online:2024-12-11 Published:2024-12-05
Contact: Kaifang WAN E-mail:wankaifang@nwpu.edu.cn
Supported by:
National Nature Science Foundation of China(62003267);the Key Research and Development Program of ShaanXi Province(2023-GHZD-33);the Fundamental Research Funds for the Central Universities(G2022KYO602);the Key Laboratory for Electromagnetic Space Operations and Applications(2022ZX0090);the National Key Laboratory of Air-based Information Perception and Fusion(202471)

摘要/Abstract

摘要：

为解决强对抗场景下无人机因遭受干扰而导致全球定位系统（GPS）失能无法精确获取自身定位的问题，考虑到无人机经常以编队或集群形式行动，提出一种依靠编队内的无人机相互测量相对空间位置并互为定位的方法，使无人机在GPS信号丢失后依然可以实时更新自身位置。针对GPS拒止环境，引入部分可观测马尔可夫决策过程（POMDP）理论，分析了POMDP模型要素，建立起协同定位调度的POMDP决策模型。提出了基于扩展卡尔曼滤波（EKF）的信念状态更新方法和基于深度强化学习中深度Q网络（DQN）的Q值估计方法，以实现协同实时精确定位。不同场景下的应用测试表明，所建立的模型能够实现编队中GPS正常无人机的高效管理调度，能够控制GPS正常无人机对GPS失效无人机进行有效协同定位，即模型有效性得到了验证。

关键词: 多无人机, GPS拒止, 协同定位, 深度强化学习, 马尔可夫决策

Abstract:

In strong adversarial scenarios， Unmanned Aerial Vehicles （UAVs） often experience GPS malfunction due to interference， making it difficult to obtain their accurate position. Since UAVs often operate in formations or clusters， this paper proposes a strategy that relies on UAVs within the formation to measure relative spatial positions and locate each other， allowing UAVs to update their position information in real time even after GPS signal loss. Firstly， in response to the GPS-denied environment， the theory of the Partially Observable Markov Decision Process （POMDP） is introduced and the elements of POMDP are analyzed to establish a POMDP decision model based on collaborative positioning and scheduling is established. A belief state update method based on the Extended Kalman Filter （EKF）， as well as a Q-value estimation method based on Deep Q-Network （DQN） in deep reinforcement learning， is proposed to achieve accurate collaborative real-time positioning. Application tests in different scenarios show that the proposed model can achieve efficient management and scheduling of UAVs in formation， and can control GPS normal UAVs to effectively coordinate and locate GPS failed UAVs， which verifies the effectiveness of the model.

Key words: multiple UAVs, GPS-denied, collaborative positioning, deep reinforcement learning, Markov decision

中图分类号:

V249

万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024.

Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024.

图/表 25

图 1

图 2

图 3

图 4

图 5

图 6

图 7

表 1

表 2

图 8

图 9

图 10

图 11

图 12

图 13

图 14

图 15

图 16

图 17

图 18

图 19

图 20

表 3

表 4

表 5

参考文献 27

1	CHUNG S J， PARANJAPE A A， DAMES P， et al. A survey on aerial swarm robotics［J］. IEEE Transactions on Robotics， 2018， 34（4）： 837-855.
2	TITTERTON D， WESTON J. Strapdown inertial navigation technology \|\| basic principles of strapdown inertial navigate on systems［M］‍∥IEEE Aerospace and Electronic Systems Magazine. Piscataway：IEEE Press， 2004： 17-58.
3	徐玉，任沁源，孙文达，等. 微小型无人直升机地磁导航算法研究［J］. 兵工学报， 2011， 32（3）： 6.
	XU Y， REN Q Y， SUN W D， et al. A geomagneic navigation algorithm for miniature unmanned heliope［J］. Acta Armamentarii， 2011， 32（3）： 6 （in Chinese）.
4	孔国杰，冯时，于会龙，等. 无人集群系统协同运动规划技术综述［J］. 兵工学报， 2023， 44（1）： 11-26.
	KONG G J， FENG S， YU H L， et al. A review on cooperative motion planning of unmanned vehicles［J］. Acta Armamentarii， 2023， 44（1）： 11-26 （in Chinese）.
5	SHARMA R， TAYLOR C. Vision based distributed cooperative navigation for MAVs in GPS denied areas： AIAA-2009-1932［R］. Reston： AIAA， 2009.
6	WYMEERSCH H， LIEN J， WIN M Z. Cooperative localization in wireless networks［J］. Proceedings of the IEEE， 2009， 97（2）： 427-450.
7	ÇAKMAK B， URUP D N， MEYER F， et al. Cooperative localization for mobile networks： a distributed belief propagation-mean field message passing algorithm［J］. IEEE Signal Processing Letters， 2016， 23（6）： 828-832.
8	VICENTE D， TOMIC S， BEKO M， et al. Performance analysis of a distributed algorithm for target localization in wireless sensor networks using hybrid measurements in a connection failure scenario［C］∥2017 International Young Engineers Forum （YEF-ECE）. Piscataway： IEEE Press， 2017.
9	CHEN K. Jointed TOA/AOA positioning algorithm for OFDM［J］. Computer Engineering and Applications， 2009， 22（7）： 988-992.
10	SILVER D， VENESS J. Monte-Carlo planning in large POMDPs［C］∥Advances in Neural Information Processing Systems 23： 24th Annual Conference on Neural Information Processing Systems 2010. New York： ACM， 2010.
11	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518： 529-533.
12	BISONG E. Building machine learning and deep learning models on Google cloud platform［M］. Berkeley： Apress， 2019： 415-421.
13	李波，黄晶益，万开方，等. 基于深度强化学习的无人机系统应用研究综述［J］. 战术导弹技术， 2023（1）： 58-68.
	LI B， HUANG J Y， WAN K F， et al. A review of research on the application of UAV system based on deep reinforcement learning［J］. Tactical Missile Technology， 2023（1）： 58-68 （in Chinese）.
14	GAO M S， ZHANG X X. Cooperative search method for multiple UAVs based on deep reinforcement learning［J］. Sensors， 2022， 22（18）： 6737.
15	YANG S Y， YU G Z， MENG Z J， et al. Autonomous obstacle avoidance of UAV based on deep reinforcement learning1［J］. Journal of Intelligent & Fuzzy Systems， 2022， 42（4）： 3323-3335.
16	DE WITT C S， PENG B， KAMIENNY P A， et al. Deep multi-agent reinforcement learning for decentralized continuous cooperative control［DB/OL］. arXiv： preprint： 2003. 06709； 2003.
17	桂林，武小悦. 部分可观测马尔可夫决策过程算法综述［J］. 系统工程与电子技术， 2008， 30（6）： 1058-1064.
	GUI L， WU X Y. Survey of algorithms for partially observable Markov decision processes［J］. Systems Engineering and Electronics， 2008， 30（6）： 1058-1064 （in Chinese）.
18	GMYTRASIEWICZ P J， DOSHI P. A framework for sequential planning in multi-agent settings［J］. Journal of Artificial Intelligence Research， 2005， 24： 49-79.
19	KAUNE R， HÖRST JULIAN， KOCH W. Accuracy analysis for TDOA localization in sensor networks［C］∥14th International Conference on Information Fusion. Piscataway： IEEE Press， 2011.
20	BAXTER L A， PUTERMAN M L. Markov decision processes： discrete stochastic dynamic programming［J］. Technometrics， 1995， 37（3）： 353.
21	SENGIJPTA S K. Fundamentals of statistical signal processing： estimation theory［J］. Technometrics， 1995， 37： 465-466.
22	GELMAN A， CARLIN J B B， STERN H S S， et al. Bayesian data analysis［M］. London： Chapman and Hall/CRC， 2015： 138-258.
23	李琳，张修社，韩春雷，等. 基于卡尔曼滤波和DDQN算法的无人机机动目标跟踪［J］. 战术导弹技术， 2022（2）： 98-104.
	LI L， ZHANG X S， HAN C L， et al. UAV maneuvering target tracking based on Kalman filter and DDQN algorithm［J］. Tactical Missile Technology， 2022（2）： 98-104 （in Chinese）.
24	JULIER S J， UHLMANN J K. Corrections to “unscented filtering and nonlinear estimation”［J］. Proceedings of the IEEE， 2004， 92（12）： 1958.
25	LANGE R J. Bellman filtering and smoothing for state–space models［J］. Journal of Econometrics， 2024， 238（2）： 105632.
26	范哲. 反向传播算法浅析［J］. 黑龙江科技信息， 2017（23）： 132-133.
	FAN Z. Analysis of back propagation algorithm［J］. Scientific and Technological Innovation， 2017（23）： 132-133 （in Chinese）.
27	秦宁宁. 无线传感器网络栅栏覆盖的研究［D］. 无锡：江南大学， 2008.
	QIN N N. Research on fence coverage in wireless sensor networks［D］.Wuxi： Jiangnan University， 2008 （in Chinese）.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

参数	数值
学习率α	0.10
奖励折扣因子γ	0.90
探索参数ε 初始值	0.8
探索参数ε 最终值	0.3
批量大小B	3 000
经验池D	3 000
episode最大步长T	3 000
目标网络更新频率K	400
最大仿真迭代次数	5 000

组别	调度策略	正常无人机数量	GPS失效无人机数量	探测基准距离	采样周期
1	无调度	9	4	4	0.04
2	短视调度	9	4	4	0.04
3	远视调度	9	4	4	0.04
4	无调度	9	6	4	0.04
5	短视调度	9	6	4	0.04
6	远视调度	9	6	4	0.04
7	无调度	9	9	4	0.04
8	短视调度	9	9	4	0.04
9	远视调度	9	9	4	0.04

组别	调度策略	正常无人机数量	GPS失效无人机数量	探测基准距离	采样周期
1	无调度	9	4	3	0.04
2	短视调度	9	4	3	0.04
3	远视调度	9	4	3	0.04
4	无调度	9	4	6	0.04
5	短视调度	9	4	6	0.04
6	远视调度	9	4	6	0.04

组别	调度策略	正常无人机数量	GPS失效无人机数量	探测基准距离	采样周期
1	无调度	9	4	4	0.02
2	短视调度	9	4	4	0.02
3	远视调度	9	4	4	0.02
4	无调度	9	4	4	0.2
5	短视调度	9	4	4	0.2
6	远视调度	9	4	4	0.2

组别	无人机数量	GPS正常无人机数量	GPS失效无人机数量	DQN训练回合数/10³
1	13	9	4	2.0
2	15	9	6	3.2
3	18	9	9	6.8
4	34	25	9	22.0
5	45	36	9	88.0
6	58	49	9	236.0

拒止环境下基于深度强化学习的多无人机协同定位

Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 25

参考文献 27

相关文章 15

编辑推荐

Metrics

本文评价

[1]	杨敏, 刘关俊, 周子渊. 基于安全强化学习的月球着陆器控制[J]. 航空学报, 2025, 46(3): 630553-630553.
[2]	柳汀, 周国鑫, 徐扬, 罗德林, 郭正玉, 杨梦杰. 融合信息图的优化哈里斯鹰多无人机动态目标搜索[J]. 航空学报, 2024, 45(S1): 730773-730773.
[3]	张鸿林, 罗建军, 马卫华. 基于机器学习的航天器规避目标威胁博弈决策[J]. 航空学报, 2024, 45(8): 329136-329136.
[4]	蔡云鹏, 周大鹏, 丁江川. 具有防撞安全约束的无人机集群智能协同控制[J]. 航空学报, 2024, 45(5): 529683-529683.
[5]	单圣哲, 张伟伟. 基于自博弈深度强化学习的空战智能决策方法[J]. 航空学报, 2024, 45(4): 328723-328723.
[6]	王祝, 张梦通, 张振鹏, 徐广通. 基于多指标动态优先级的无人机协同路径规划[J]. 航空学报, 2024, 45(4): 328816-328816.
[7]	高兵, 张哲婕, 邹启杰, 刘治国, 赵锡玲. 基于深度强化学习和信息论的多智能体通信方法[J]. 航空学报, 2024, 45(18): 329862-329862.
[8]	李佐龙, 朱纪洪, 匡敏驰, 张杰, 任洁. 基于混合动作的空战分层强化学习决策算法[J]. 航空学报, 2024, 45(17): 530053-530053.
[9]	顾兆军, 赵欢, 王家亮, 聂留阳. 基于马尔可夫决策的四轴飞行器自动着陆方法[J]. 航空学报, 2024, 45(15): 329652-329652.
[10]	武天才, 王宏伦, 任斌, 刘一恒, 吴星雨, 严国乘. 考虑规避与突防的高超声速飞行器智能容错制导控制一体化设计[J]. 航空学报, 2024, 45(15): 329607-329607.
[11]	黄山, 吕永玺, 朱奇, 李珂澄, 史静平. 仅使用距离量测的多无人机协同环绕未知目标[J]. 航空学报, 2024, 45(13): 329535-329535.
[12]	张清瑞, 刘赟韵, 孙慧杰, 朱波. 固定翼无人机紧密编队的鲁棒协同跟踪控制[J]. 航空学报, 2024, 45(1): 629233-629233.
[13]	倪炜霖, 王永海, 徐聪, 赤丰华, 梁海朝. 基于强化学习的高超飞行器协同博弈制导方法[J]. 航空学报, 2023, 44(S2): 729400-729400.
[14]	熊骏, 解相朋, 熊智, 庄园, 郑宇. 基于图模型的无人集群同步自定位与相对定位[J]. 航空学报, 2023, 44(S2): 729708-729708.
[15]	王雪鉴, 文永明, 石晓荣, 张宁宁, 刘洁玺. 多智能体多耦合任务混合式智能决策架构设计[J]. 航空学报, 2023, 44(S2): 729770-729770.