拒止环境下基于深度强化学习的多无人机协同定位
收稿日期: 2024-08-01
修回日期: 2024-09-27
录用日期: 2024-11-21
网络出版日期: 2024-12-05
基金资助
国家自然科学基金(62003267);陕西省重点研发计划(2023-GHZD-33);中央高校基本科研业务费专项资金(G2022KYO602);电磁空间作战与应用重点实验室资助(2022ZX0090);空基信息感知与融合全国重点实验室开放课题资助(202471)
Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment
Received date: 2024-08-01
Revised date: 2024-09-27
Accepted date: 2024-11-21
Online published: 2024-12-05
Supported by
National Nature Science Foundation of China(62003267);the Key Research and Development Program of ShaanXi Province(2023-GHZD-33);the Fundamental Research Funds for the Central Universities(G2022KYO602);the Key Laboratory for Electromagnetic Space Operations and Applications(2022ZX0090);the National Key Laboratory of Air-based Information Perception and Fusion(202471)
为解决强对抗场景下无人机因遭受干扰而导致全球定位系统(GPS)失能无法精确获取自身定位的问题,考虑到无人机经常以编队或集群形式行动,提出一种依靠编队内的无人机相互测量相对空间位置并互为定位的方法,使无人机在GPS信号丢失后依然可以实时更新自身位置。针对GPS拒止环境,引入部分可观测马尔可夫决策过程(POMDP)理论,分析了POMDP模型要素,建立起协同定位调度的POMDP决策模型。提出了基于扩展卡尔曼滤波(EKF)的信念状态更新方法和基于深度强化学习中深度Q网络(DQN)的Q值估计方法,以实现协同实时精确定位。不同场景下的应用测试表明,所建立的模型能够实现编队中GPS正常无人机的高效管理调度,能够控制GPS正常无人机对GPS失效无人机进行有效协同定位,即模型有效性得到了验证。
万开方 , 吴志林 , 武韫晖 , 强皓植 , 吴艺博 , 李波 . 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025 , 46(8) : 331024 -331024 . DOI: 10.7527/S1000-6893.2024.31024
In strong adversarial scenarios, Unmanned Aerial Vehicles (UAVs) often experience GPS malfunction due to interference, making it difficult to obtain their accurate position. Since UAVs often operate in formations or clusters, this paper proposes a strategy that relies on UAVs within the formation to measure relative spatial positions and locate each other, allowing UAVs to update their position information in real time even after GPS signal loss. Firstly, in response to the GPS-denied environment, the theory of the Partially Observable Markov Decision Process (POMDP) is introduced and the elements of POMDP are analyzed to establish a POMDP decision model based on collaborative positioning and scheduling is established. A belief state update method based on the Extended Kalman Filter (EKF), as well as a Q-value estimation method based on Deep Q-Network (DQN) in deep reinforcement learning, is proposed to achieve accurate collaborative real-time positioning. Application tests in different scenarios show that the proposed model can achieve efficient management and scheduling of UAVs in formation, and can control GPS normal UAVs to effectively coordinate and locate GPS failed UAVs, which verifies the effectiveness of the model.
1 | CHUNG S J, PARANJAPE A A, DAMES P, et al. A survey on aerial swarm robotics[J]. IEEE Transactions on Robotics, 2018, 34(4): 837-855. |
2 | TITTERTON D, WESTON J. Strapdown inertial navigation technology || basic principles of strapdown inertial navigate on systems[M]?∥IEEE Aerospace and Electronic Systems Magazine. Piscataway:IEEE Press, 2004: 17-58. |
3 | 徐玉, 任沁源, 孙文达, 等. 微小型无人直升机地磁导航算法研究[J]. 兵工学报, 2011, 32(3): 6. |
XU Y, REN Q Y, SUN W D, et al. A geomagneic navigation algorithm for miniature unmanned heliope[J]. Acta Armamentarii, 2011, 32(3): 6 (in Chinese). | |
4 | 孔国杰, 冯时, 于会龙, 等. 无人集群系统协同运动规划技术综述[J]. 兵工学报, 2023, 44(1): 11-26. |
KONG G J, FENG S, YU H L, et al. A review on cooperative motion planning of unmanned vehicles[J]. Acta Armamentarii, 2023, 44(1): 11-26 (in Chinese). | |
5 | SHARMA R, TAYLOR C. Vision based distributed cooperative navigation for MAVs in GPS denied areas: AIAA-2009-1932[R]. Reston: AIAA, 2009. |
6 | WYMEERSCH H, LIEN J, WIN M Z. Cooperative localization in wireless networks[J]. Proceedings of the IEEE, 2009, 97(2): 427-450. |
7 | ?AKMAK B, URUP D N, MEYER F, et al. Cooperative localization for mobile networks: a distributed belief propagation-mean field message passing algorithm[J]. IEEE Signal Processing Letters, 2016, 23(6): 828-832. |
8 | VICENTE D, TOMIC S, BEKO M, et al. Performance analysis of a distributed algorithm for target localization in wireless sensor networks using hybrid measurements in a connection failure scenario[C]∥2017 International Young Engineers Forum (YEF-ECE). Piscataway: IEEE Press, 2017. |
9 | CHEN K. Jointed TOA/AOA positioning algorithm for OFDM[J]. Computer Engineering and Applications, 2009, 22(7): 988-992. |
10 | SILVER D, VENESS J. Monte-Carlo planning in large POMDPs[C]∥Advances in Neural Information Processing Systems 23: 24th Annual Conference on Neural Information Processing Systems 2010. New York: ACM, 2010. |
11 | MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518: 529-533. |
12 | BISONG E. Building machine learning and deep learning models on Google cloud platform[M]. Berkeley: Apress, 2019: 415-421. |
13 | 李波, 黄晶益, 万开方, 等. 基于深度强化学习的无人机系统应用研究综述[J]. 战术导弹技术, 2023(1): 58-68. |
LI B, HUANG J Y, WAN K F, et al. A review of research on the application of UAV system based on deep reinforcement learning[J]. Tactical Missile Technology, 2023(1): 58-68 (in Chinese). | |
14 | GAO M S, ZHANG X X. Cooperative search method for multiple UAVs based on deep reinforcement learning[J]. Sensors, 2022, 22(18): 6737. |
15 | YANG S Y, YU G Z, MENG Z J, et al. Autonomous obstacle avoidance of UAV based on deep reinforcement learning1[J]. Journal of Intelligent & Fuzzy Systems, 2022, 42(4): 3323-3335. |
16 | DE WITT C S, PENG B, KAMIENNY P A, et al. Deep multi-agent reinforcement learning for decentralized continuous cooperative control[DB/OL]. arXiv: preprint: 2003. 06709; 2003. |
17 | 桂林, 武小悦. 部分可观测马尔可夫决策过程算法综述[J]. 系统工程与电子技术, 2008, 30(6): 1058-1064. |
GUI L, WU X Y. Survey of algorithms for partially observable Markov decision processes[J]. Systems Engineering and Electronics, 2008, 30(6): 1058-1064 (in Chinese). | |
18 | GMYTRASIEWICZ P J, DOSHI P. A framework for sequential planning in multi-agent settings[J]. Journal of Artificial Intelligence Research, 2005, 24: 49-79. |
19 | KAUNE R, H?RST JULIAN, KOCH W. Accuracy analysis for TDOA localization in sensor networks[C]∥14th International Conference on Information Fusion. Piscataway: IEEE Press, 2011. |
20 | BAXTER L A, PUTERMAN M L. Markov decision processes: discrete stochastic dynamic programming[J]. Technometrics, 1995, 37(3): 353. |
21 | SENGIJPTA S K. Fundamentals of statistical signal processing: estimation theory[J]. Technometrics, 1995, 37: 465-466. |
22 | GELMAN A, CARLIN J B B, STERN H S S, et al. Bayesian data analysis[M]. London: Chapman and Hall/CRC, 2015: 138-258. |
23 | 李琳, 张修社, 韩春雷, 等. 基于卡尔曼滤波和DDQN算法的无人机机动目标跟踪[J]. 战术导弹技术, 2022(2): 98-104. |
LI L, ZHANG X S, HAN C L, et al. UAV maneuvering target tracking based on Kalman filter and DDQN algorithm[J]. Tactical Missile Technology, 2022(2): 98-104 (in Chinese). | |
24 | JULIER S J, UHLMANN J K. Corrections to “unscented filtering and nonlinear estimation”[J]. Proceedings of the IEEE, 2004, 92(12): 1958. |
25 | LANGE R J. Bellman filtering and smoothing for state–space models[J]. Journal of Econometrics, 2024, 238(2): 105632. |
26 | 范哲. 反向传播算法浅析[J]. 黑龙江科技信息, 2017(23): 132-133. |
FAN Z. Analysis of back propagation algorithm[J]. Scientific and Technological Innovation, 2017(23): 132-133 (in Chinese). | |
27 | 秦宁宁. 无线传感器网络栅栏覆盖的研究[D]. 无锡: 江南大学, 2008. |
QIN N N. Research on fence coverage in wireless sensor networks[D].Wuxi: Jiangnan University, 2008 (in Chinese). |
/
〈 |
|
〉 |