拒止环境下基于深度强化学习的多无人机协同定位

doi:10.7527/S1000-6893.2024.31024

Abstract

Abstract:

In strong adversarial scenarios， Unmanned Aerial Vehicles （UAVs） often experience GPS malfunction due to interference， making it difficult to obtain their accurate position. Since UAVs often operate in formations or clusters， this paper proposes a strategy that relies on UAVs within the formation to measure relative spatial positions and locate each other， allowing UAVs to update their position information in real time even after GPS signal loss. Firstly， in response to the GPS-denied environment， the theory of the Partially Observable Markov Decision Process （POMDP） is introduced and the elements of POMDP are analyzed to establish a POMDP decision model based on collaborative positioning and scheduling is established. A belief state update method based on the Extended Kalman Filter （EKF）， as well as a Q-value estimation method based on Deep Q-Network （DQN） in deep reinforcement learning， is proposed to achieve accurate collaborative real-time positioning. Application tests in different scenarios show that the proposed model can achieve efficient management and scheduling of UAVs in formation， and can control GPS normal UAVs to effectively coordinate and locate GPS failed UAVs， which verifies the effectiveness of the model.

Key words: multiple UAVs, GPS-denied, collaborative positioning, deep reinforcement learning, Markov decision

CLC Number:

V249

Kaifang WAN, Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI. Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(8): 331024.

Figures/Tables 25

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Fig.7

Table 1

Table 2

Fig.8

Fig.9

Fig.10

Fig.11

Fig.12

Fig.13

Fig.14

Fig.15

Fig.16

Fig.17

Fig.18

Fig.19

Fig.20

Table 3

Table 4

Table 5

References 27

1	CHUNG S J， PARANJAPE A A， DAMES P， et al. A survey on aerial swarm robotics［J］. IEEE Transactions on Robotics， 2018， 34（4）： 837-855.
2	TITTERTON D， WESTON J. Strapdown inertial navigation technology \|\| basic principles of strapdown inertial navigate on systems［M］‍∥IEEE Aerospace and Electronic Systems Magazine. Piscataway：IEEE Press， 2004： 17-58.
3	徐玉，任沁源，孙文达，等. 微小型无人直升机地磁导航算法研究［J］. 兵工学报， 2011， 32（3）： 6.
	XU Y， REN Q Y， SUN W D， et al. A geomagneic navigation algorithm for miniature unmanned heliope［J］. Acta Armamentarii， 2011， 32（3）： 6 （in Chinese）.
4	孔国杰，冯时，于会龙，等. 无人集群系统协同运动规划技术综述［J］. 兵工学报， 2023， 44（1）： 11-26.
	KONG G J， FENG S， YU H L， et al. A review on cooperative motion planning of unmanned vehicles［J］. Acta Armamentarii， 2023， 44（1）： 11-26 （in Chinese）.
5	SHARMA R， TAYLOR C. Vision based distributed cooperative navigation for MAVs in GPS denied areas： AIAA-2009-1932［R］. Reston： AIAA， 2009.
6	WYMEERSCH H， LIEN J， WIN M Z. Cooperative localization in wireless networks［J］. Proceedings of the IEEE， 2009， 97（2）： 427-450.
7	ÇAKMAK B， URUP D N， MEYER F， et al. Cooperative localization for mobile networks： a distributed belief propagation-mean field message passing algorithm［J］. IEEE Signal Processing Letters， 2016， 23（6）： 828-832.
8	VICENTE D， TOMIC S， BEKO M， et al. Performance analysis of a distributed algorithm for target localization in wireless sensor networks using hybrid measurements in a connection failure scenario［C］∥2017 International Young Engineers Forum （YEF-ECE）. Piscataway： IEEE Press， 2017.
9	CHEN K. Jointed TOA/AOA positioning algorithm for OFDM［J］. Computer Engineering and Applications， 2009， 22（7）： 988-992.
10	SILVER D， VENESS J. Monte-Carlo planning in large POMDPs［C］∥Advances in Neural Information Processing Systems 23： 24th Annual Conference on Neural Information Processing Systems 2010. New York： ACM， 2010.
11	MNIH V， KAVUKCUOGLU K， SILVER D， et al. Human-level control through deep reinforcement learning［J］. Nature， 2015， 518： 529-533.
12	BISONG E. Building machine learning and deep learning models on Google cloud platform［M］. Berkeley： Apress， 2019： 415-421.
13	李波，黄晶益，万开方，等. 基于深度强化学习的无人机系统应用研究综述［J］. 战术导弹技术， 2023（1）： 58-68.
	LI B， HUANG J Y， WAN K F， et al. A review of research on the application of UAV system based on deep reinforcement learning［J］. Tactical Missile Technology， 2023（1）： 58-68 （in Chinese）.
14	GAO M S， ZHANG X X. Cooperative search method for multiple UAVs based on deep reinforcement learning［J］. Sensors， 2022， 22（18）： 6737.
15	YANG S Y， YU G Z， MENG Z J， et al. Autonomous obstacle avoidance of UAV based on deep reinforcement learning1［J］. Journal of Intelligent & Fuzzy Systems， 2022， 42（4）： 3323-3335.
16	DE WITT C S， PENG B， KAMIENNY P A， et al. Deep multi-agent reinforcement learning for decentralized continuous cooperative control［DB/OL］. arXiv： preprint： 2003. 06709； 2003.
17	桂林，武小悦. 部分可观测马尔可夫决策过程算法综述［J］. 系统工程与电子技术， 2008， 30（6）： 1058-1064.
	GUI L， WU X Y. Survey of algorithms for partially observable Markov decision processes［J］. Systems Engineering and Electronics， 2008， 30（6）： 1058-1064 （in Chinese）.
18	GMYTRASIEWICZ P J， DOSHI P. A framework for sequential planning in multi-agent settings［J］. Journal of Artificial Intelligence Research， 2005， 24： 49-79.
19	KAUNE R， HÖRST JULIAN， KOCH W. Accuracy analysis for TDOA localization in sensor networks［C］∥14th International Conference on Information Fusion. Piscataway： IEEE Press， 2011.
20	BAXTER L A， PUTERMAN M L. Markov decision processes： discrete stochastic dynamic programming［J］. Technometrics， 1995， 37（3）： 353.
21	SENGIJPTA S K. Fundamentals of statistical signal processing： estimation theory［J］. Technometrics， 1995， 37： 465-466.
22	GELMAN A， CARLIN J B B， STERN H S S， et al. Bayesian data analysis［M］. London： Chapman and Hall/CRC， 2015： 138-258.
23	李琳，张修社，韩春雷，等. 基于卡尔曼滤波和DDQN算法的无人机机动目标跟踪［J］. 战术导弹技术， 2022（2）： 98-104.
	LI L， ZHANG X S， HAN C L， et al. UAV maneuvering target tracking based on Kalman filter and DDQN algorithm［J］. Tactical Missile Technology， 2022（2）： 98-104 （in Chinese）.
24	JULIER S J， UHLMANN J K. Corrections to “unscented filtering and nonlinear estimation”［J］. Proceedings of the IEEE， 2004， 92（12）： 1958.
25	LANGE R J. Bellman filtering and smoothing for state–space models［J］. Journal of Econometrics， 2024， 238（2）： 105632.
26	范哲. 反向传播算法浅析［J］. 黑龙江科技信息， 2017（23）： 132-133.
	FAN Z. Analysis of back propagation algorithm［J］. Scientific and Technological Innovation， 2017（23）： 132-133 （in Chinese）.
27	秦宁宁. 无线传感器网络栅栏覆盖的研究［D］. 无锡：江南大学， 2008.
	QIN N N. Research on fence coverage in wireless sensor networks［D］.Wuxi： Jiangnan University， 2008 （in Chinese）.

参数	数值
学习率α	0.10
奖励折扣因子γ	0.90
探索参数ε 初始值	0.8
探索参数ε 最终值	0.3
批量大小B	3 000
经验池D	3 000
episode最大步长T	3 000
目标网络更新频率K	400
最大仿真迭代次数	5 000

组别	调度策略	正常无人机数量	GPS失效无人机数量	探测基准距离	采样周期
1	无调度	9	4	4	0.04
2	短视调度	9	4	4	0.04
3	远视调度	9	4	4	0.04
4	无调度	9	6	4	0.04
5	短视调度	9	6	4	0.04
6	远视调度	9	6	4	0.04
7	无调度	9	9	4	0.04
8	短视调度	9	9	4	0.04
9	远视调度	9	9	4	0.04

组别	调度策略	正常无人机数量	GPS失效无人机数量	探测基准距离	采样周期
1	无调度	9	4	3	0.04
2	短视调度	9	4	3	0.04
3	远视调度	9	4	3	0.04
4	无调度	9	4	6	0.04
5	短视调度	9	4	6	0.04
6	远视调度	9	4	6	0.04

组别	调度策略	正常无人机数量	GPS失效无人机数量	探测基准距离	采样周期
1	无调度	9	4	4	0.02
2	短视调度	9	4	4	0.02
3	远视调度	9	4	4	0.02
4	无调度	9	4	4	0.2
5	短视调度	9	4	4	0.2
6	远视调度	9	4	4	0.2

组别	无人机数量	GPS正常无人机数量	GPS失效无人机数量	DQN训练回合数/10³
1	13	9	4	2.0
2	15	9	6	3.2
3	18	9	9	6.8
4	34	25	9	22.0
5	45	36	9	88.0
6	58	49	9	236.0

Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 25

References 27

Related Articles 15

Recommended Articles

Metrics

Comments

[1]	Min YANG, Guanjun LIU, Ziyuan ZHOU. Control of lunar landers based on secure reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(3): 630553-630553.
[2]	Honglin ZHANG, Jianjun LUO, Weihua MA. Spacecraft game decision making for threat avoidance of space targets based on machine learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(8): 329136-329136.
[3]	Yunpeng CAI, Dapeng ZHOU, Jiangchuan DING. Intelligent collaborative control of UAV swarms with collision avoidance safety constraints [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(5): 529683-529683.
[4]	Shengzhe SHAN, Weiwei ZHANG. Air combat intelligent decision-making method based on self-play and deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(4): 328723-328723.
[5]	Bing GAO, Zhejie ZHANG, Qijie ZOU, Zhiguo LIU, Xiling ZHAO. Multi-agent communication cooperation based on deep reinforcement learning and information theory [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(18): 329862-329862.
[6]	Zuolong LI, Jihong ZHU, Minchi KUANG, Jie ZHANG, Jie REN. Hierarchical decision algorithm for air combat with hybrid action based on deep reinforcement learning [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(17): 530053-530053.
[7]	Zhaojun GU, Huan ZHAO, Jialiang WANG, Liuyang NIE. Automatic landing method for quad-rotor helicopter based on Markov decision process [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(15): 329652-329652.
[8]	Tiancai WU, Honglun WANG, Bin REN, Yiheng LIU, Xingyu WU, Guocheng YAN. Learning-based integrated fault-tolerant guidance and control for hypersonic vehicles considering avoidance and penetration [J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(15): 329607-329607.
[9]	Xuejian WANG, Yongming WEN, Xiaorong SHI, Ningning ZHANG, Jiexi LIU. Design of hybrid intelligent decision framework for multi⁃agent and multi⁃coupling tasks [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S2): 729770-729770.
[10]	Xizhen GAO, Liang TANG, Huang HUANG. Deep reinforcement learning in autonomous manipulation for celestial bodies exploration: Applications and challenges [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(6): 26762-026762.
[11]	Pan ZHOU, Jiangtao HUANG, Sheng ZHANG, Gang LIU, Bowen SHU, Jigang TANG. Intelligent air combat decision making and simulation based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(4): 126731-126731.
[12]	Xiangwei ZHU, Dan SHEN, Kai XIAO, Yuexin MA, Xiang LIAO, Fuqiang GU, Fangwen YU, Kefu GAO, Jingnan LIU. Mechanisms, algorithms, implementation and perspectives of brain⁃inspired navigation [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 28569-028569.
[13]	Lei DONG, Hongbing CHEN, Xi CHEN, Changxiao ZHAO. Distributed multi-agent coalition task allocation strategy for single pilot operation mode based on DQN [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327895-327895.
[14]	Wenxue CHEN, Changsheng GAO, Wuxing JING. Trust region policy optimization guidance algorithm for intercepting maneuvering target [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(11): 327596-327596.
[15]	Sheng ZHANG, Pan ZHOU, Yang HE, Jiangtao HUANG, Gang LIU, Jigang TANG, Huaizhi JIA, Xin DU. Air combat maneuver decision-making test based on deep reinforcement learning [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(10): 128094-128094.