Electronics and Electrical Engineering and Control

Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment

  • Kaifang WAN ,
  • Zhilin WU ,
  • Yunhui WU ,
  • Haozhi QIANG ,
  • Yibo WU ,
  • Bo LI
Expand
  • School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710072,China

Received date: 2024-08-01

  Revised date: 2024-09-27

  Accepted date: 2024-11-21

  Online published: 2024-12-05

Supported by

National Nature Science Foundation of China(62003267);the Key Research and Development Program of ShaanXi Province(2023-GHZD-33);the Fundamental Research Funds for the Central Universities(G2022KYO602);the Key Laboratory for Electromagnetic Space Operations and Applications(2022ZX0090);the National Key Laboratory of Air-based Information Perception and Fusion(202471)

Abstract

In strong adversarial scenarios, Unmanned Aerial Vehicles (UAVs) often experience GPS malfunction due to interference, making it difficult to obtain their accurate position. Since UAVs often operate in formations or clusters, this paper proposes a strategy that relies on UAVs within the formation to measure relative spatial positions and locate each other, allowing UAVs to update their position information in real time even after GPS signal loss. Firstly, in response to the GPS-denied environment, the theory of the Partially Observable Markov Decision Process (POMDP) is introduced and the elements of POMDP are analyzed to establish a POMDP decision model based on collaborative positioning and scheduling is established. A belief state update method based on the Extended Kalman Filter (EKF), as well as a Q-value estimation method based on Deep Q-Network (DQN) in deep reinforcement learning, is proposed to achieve accurate collaborative real-time positioning. Application tests in different scenarios show that the proposed model can achieve efficient management and scheduling of UAVs in formation, and can control GPS normal UAVs to effectively coordinate and locate GPS failed UAVs, which verifies the effectiveness of the model.

Cite this article

Kaifang WAN , Zhilin WU , Yunhui WU , Haozhi QIANG , Yibo WU , Bo LI . Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2025 , 46(8) : 331024 -331024 . DOI: 10.7527/S1000-6893.2024.31024

References

1 CHUNG S J, PARANJAPE A A, DAMES P, et al. A survey on aerial swarm robotics[J]. IEEE Transactions on Robotics201834(4): 837-855.
2 TITTERTON D, WESTON J. Strapdown inertial navigation technology || basic principles of strapdown inertial navigate on systems[M]?∥IEEE Aerospace and Electronic Systems Magazine. Piscataway:IEEE Press, 2004: 17-58.
3 徐玉, 任沁源, 孙文达, 等. 微小型无人直升机地磁导航算法研究[J]. 兵工学报201132(3): 6.
  XU Y, REN Q Y, SUN W D, et al. A geomagneic navigation algorithm for miniature unmanned heliope[J]. Acta Armamentarii201132(3): 6 (in Chinese).
4 孔国杰, 冯时, 于会龙, 等. 无人集群系统协同运动规划技术综述[J]. 兵工学报202344(1): 11-26.
  KONG G J, FENG S, YU H L, et al. A review on cooperative motion planning of unmanned vehicles[J]. Acta Armamentarii202344(1): 11-26 (in Chinese).
5 SHARMA R, TAYLOR C. Vision based distributed cooperative navigation for MAVs in GPS denied areas: AIAA-2009-1932[R]. Reston: AIAA, 2009.
6 WYMEERSCH H, LIEN J, WIN M Z. Cooperative localization in wireless networks[J]. Proceedings of the IEEE200997(2): 427-450.
7 ?AKMAK B, URUP D N, MEYER F, et al. Cooperative localization for mobile networks: a distributed belief propagation-mean field message passing algorithm[J]. IEEE Signal Processing Letters201623(6): 828-832.
8 VICENTE D, TOMIC S, BEKO M, et al. Performance analysis of a distributed algorithm for target localization in wireless sensor networks using hybrid measurements in a connection failure scenario[C]∥2017 International Young Engineers Forum (YEF-ECE). Piscataway: IEEE Press, 2017.
9 CHEN K. Jointed TOA/AOA positioning algorithm for OFDM[J]. Computer Engineering and Applications200922(7): 988-992.
10 SILVER D, VENESS J. Monte-Carlo planning in large POMDPs[C]∥Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. New York: ACM, 2010.
11 MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature2015518: 529-533.
12 BISONG E. Building machine learning and deep learning models on Google cloud platform[M]. Berkeley: Apress, 2019: 415-421.
13 李波, 黄晶益, 万开方, 等. 基于深度强化学习的无人机系统应用研究综述[J]. 战术导弹技术2023(1): 58-68.
  LI B, HUANG J Y, WAN K F, et al. A review of research on the application of UAV system based on deep reinforcement learning[J]. Tactical Missile Technology2023(1): 58-68 (in Chinese).
14 GAO M S, ZHANG X X. Cooperative search method for multiple UAVs based on deep reinforcement learning[J]. Sensors202222(18): 6737.
15 YANG S Y, YU G Z, MENG Z J, et al. Autonomous obstacle avoidance of UAV based on deep reinforcement learning1[J]. Journal of Intelligent & Fuzzy Systems202242(4): 3323-3335.
16 DE WITT C S, PENG B, KAMIENNY P A, et al. Deep multi-agent reinforcement learning for decentralized continuous cooperative control[DB/OL]. arXiv: preprint2003. 06709; 2003.
17 桂林, 武小悦. 部分可观测马尔可夫决策过程算法综述[J]. 系统工程与电子技术200830(6): 1058-1064.
  GUI L, WU X Y. Survey of algorithms for partially observable Markov decision processes[J]. Systems Engineering and Electronics200830(6): 1058-1064 (in Chinese).
18 GMYTRASIEWICZ P J, DOSHI P. A framework for sequential planning in multi-agent settings[J]. Journal of Artificial Intelligence Research200524: 49-79.
19 KAUNE R, H?RST JULIAN, KOCH W. Accuracy analysis for TDOA localization in sensor networks[C]∥14th International Conference on Information Fusion. Piscataway: IEEE Press, 2011.
20 BAXTER L A, PUTERMAN M L. Markov decision processes: discrete stochastic dynamic programming[J]. Technometrics199537(3): 353.
21 SENGIJPTA S K. Fundamentals of statistical signal processing: estimation theory[J]. Technometrics199537: 465-466.
22 GELMAN A, CARLIN J B B, STERN H S S, et al. Bayesian data analysis[M]. London: Chapman and Hall/CRC, 2015: 138-258.
23 李琳, 张修社, 韩春雷, 等. 基于卡尔曼滤波和DDQN算法的无人机机动目标跟踪[J]. 战术导弹技术2022(2): 98-104.
  LI L, ZHANG X S, HAN C L, et al. UAV maneuvering target tracking based on Kalman filter and DDQN algorithm[J]. Tactical Missile Technology2022(2): 98-104 (in Chinese).
24 JULIER S J, UHLMANN J K. Corrections to “unscented filtering and nonlinear estimation”[J]. Proceedings of the IEEE200492(12): 1958.
25 LANGE R J. Bellman filtering and smoothing for state–space models[J]. Journal of Econometrics2024238(2): 105632.
26 范哲. 反向传播算法浅析[J]. 黑龙江科技信息2017(23): 132-133.
  FAN Z. Analysis of back propagation algorithm[J]. Scientific and Technological Innovation2017(23): 132-133 (in Chinese).
27 秦宁宁. 无线传感器网络栅栏覆盖的研究[D]. 无锡: 江南大学, 2008.
  QIN N N. Research on fence coverage in wireless sensor networks[D].Wuxi: Jiangnan University, 2008 (in Chinese).
Outlines

/