导航

Acta Aeronautica et Astronautica Sinica ›› 2025, Vol. 46 ›› Issue (23): 632457.doi: 10.7527/S1000-6893.2025.32457

• special column • Previous Articles    

Dual-branch feature aggregation for UAV visual place recognition

Qi LIU, Zhixiang PEI, Le HUI, Mingyi HE, Yuchao DAI()   

  1. Shaanxi Key Laboratory of Information Acquisition and Processing,School of Electronics and Information,Northwestern Polytechnical University,Xi’an 710129,China
  • Received:2025-06-23 Revised:2025-07-28 Accepted:2025-08-19 Online:2025-09-09 Published:2025-09-05
  • Contact: Yuchao DAI E-mail:daiyuchao@nwpu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62271410)

Abstract:

UAVs’ reliance on Global Navigation Satellite Systems (GNSS) for navigation and positioning is prone to failure due to signal blockage or interference. Visual Place Recognition (VPR) enables geographic localization by matching the visual information captured by UAVs with pre-built map data, providing reliable positioning information in GNSS-denied environments, and thus become a research hotspot in recent years. Traditional VPR methods typically depend on pre-trained networks to extract global features for matching and retrieval, but they are sensitive to changes in visual appearance such as viewpoint, scale, and lighting, and are prone to losing fine-grained information. To address these issues, this paper proposes a UAV visual geo-localization method based on a dual-branch feature aggregation network that combines a pre-trained Vision Transformer model and a state-space model to extract more robust features. Specifically, a dual-branch feature extraction network integrating the DINOv2 and VMamba models is designed, which leverages the global semantic understanding of ViT and the local dynamic modeling capability of the visual state-space model to achieve stronger generalization and fine-grained perception. Additionally, the method introduces an efficient feature fusion framework inspired by the MLP-Mixer architecture to enhance the performance of multi-channel feature representation. Experiments conducted on the same-view ALTO dataset and the cross-view VIGOR dataset demonstrate that the proposed method achieves high accuracy in metrics such as R@1 and R@5, outperforming existing methods. This method is proved effective in identifying matching images in different scenarios.

Key words: UAV visual place recognition, visual matching localization, state-space model, dual-branch feature extraction, image retrieval

CLC Number: