航空学报 > 2025, Vol. 46 Issue (8): 331024-331024   doi: 10.7527/S1000-6893.2024.31024

拒止环境下基于深度强化学习的多无人机协同定位

万开方(), 吴志林, 武韫晖, 强皓植, 吴艺博, 李波   

  1. 西北工业大学 电子信息学院,西安 710072
  • 收稿日期:2024-08-01 修回日期:2024-09-27 接受日期:2024-11-21 出版日期:2024-12-11 发布日期:2024-12-05
  • 通讯作者: 万开方 E-mail:wankaifang@nwpu.edu.cn
  • 基金资助:
    国家自然科学基金(62003267);陕西省重点研发计划(2023-GHZD-33);中央高校基本科研业务费专项资金(G2022KYO602);电磁空间作战与应用重点实验室资助(2022ZX0090);空基信息感知与融合全国重点实验室开放课题资助(202471)

Cooperative location of multiple UAVs with deep reinforcement learning in GPS-denied environment

Kaifang WAN(), Zhilin WU, Yunhui WU, Haozhi QIANG, Yibo WU, Bo LI   

  1. School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2024-08-01 Revised:2024-09-27 Accepted:2024-11-21 Online:2024-12-11 Published:2024-12-05
  • Contact: Kaifang WAN E-mail:wankaifang@nwpu.edu.cn
  • Supported by:
    National Nature Science Foundation of China(62003267);the Key Research and Development Program of ShaanXi Province(2023-GHZD-33);the Fundamental Research Funds for the Central Universities(G2022KYO602);the Key Laboratory for Electromagnetic Space Operations and Applications(2022ZX0090);the National Key Laboratory of Air-based Information Perception and Fusion(202471)

摘要:

为解决强对抗场景下无人机因遭受干扰而导致全球定位系统(GPS)失能无法精确获取自身定位的问题,考虑到无人机经常以编队或集群形式行动,提出一种依靠编队内的无人机相互测量相对空间位置并互为定位的方法,使无人机在GPS信号丢失后依然可以实时更新自身位置。针对GPS拒止环境,引入部分可观测马尔可夫决策过程(POMDP)理论,分析了POMDP模型要素,建立起协同定位调度的POMDP决策模型。提出了基于扩展卡尔曼滤波(EKF)的信念状态更新方法和基于深度强化学习中深度Q网络(DQN)的Q值估计方法,以实现协同实时精确定位。不同场景下的应用测试表明,所建立的模型能够实现编队中GPS正常无人机的高效管理调度,能够控制GPS正常无人机对GPS失效无人机进行有效协同定位,即模型有效性得到了验证。

关键词: 多无人机, GPS拒止, 协同定位, 深度强化学习, 马尔可夫决策

Abstract:

In strong adversarial scenarios, Unmanned Aerial Vehicles (UAVs) often experience GPS malfunction due to interference, making it difficult to obtain their accurate position. Since UAVs often operate in formations or clusters, this paper proposes a strategy that relies on UAVs within the formation to measure relative spatial positions and locate each other, allowing UAVs to update their position information in real time even after GPS signal loss. Firstly, in response to the GPS-denied environment, the theory of the Partially Observable Markov Decision Process (POMDP) is introduced and the elements of POMDP are analyzed to establish a POMDP decision model based on collaborative positioning and scheduling is established. A belief state update method based on the Extended Kalman Filter (EKF), as well as a Q-value estimation method based on Deep Q-Network (DQN) in deep reinforcement learning, is proposed to achieve accurate collaborative real-time positioning. Application tests in different scenarios show that the proposed model can achieve efficient management and scheduling of UAVs in formation, and can control GPS normal UAVs to effectively coordinate and locate GPS failed UAVs, which verifies the effectiveness of the model.

Key words: multiple UAVs, GPS-denied, collaborative positioning, deep reinforcement learning, Markov decision

中图分类号: