导航

Acta Aeronautica et Astronautica Sinica ›› 2025, Vol. 46 ›› Issue (23): 632017.doi: 10.7527/S1000-6893.2025.32017

• special column • Previous Articles    

RGB-T UAV object tracking based on feature-cooperative reconstruction

Dong GAO, Pujian LAI, Shilei WANG, Gong CHENG()   

  1. School of Automation,Northwestern Polytechnical University,Xi’an 710072,China
  • Received:2025-03-25 Revised:2025-04-16 Accepted:2025-05-30 Online:2025-06-30 Published:2025-06-13
  • Contact: Gong CHENG E-mail:gcheng@nwpu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(61772425);Shaanxi Province Natural Science Foundation(2021JC-16)

Abstract:

RGB-T Unmanned Aerial Vehicle (UAV) object tracking enhances tracking robustness in complex environments by fusing complementary information from visible light and thermal infrared modalities. However, existing methods neglect the noise interference caused by modality gaps, which weakens the effectiveness of cross-modal feature complementarity and degrades the power of feature representation; thereby, limiting the performance of RGB-T UAV trackers. To address this issue, a feature-cooperative reconstruction-based tracker is proposed, the core of which is to develop a feature-cooperative reconstruction module, consisting of a cross-modal interaction encoder and a feature reconstruction decoder. Specifically, the cross-modal interaction encoder employs an adaptive feature interaction strategy to extract critical complementary information from the auxiliary modality while effectively suppressing cross-modal noise interference. The feature reconstruction decoder then utilizes the query features from the encoder to guide the reconstruction of features, preserving modality-specific information while incorporating cross-modal complementary details to enhance feature representation. Additionally, to improve target localization accuracy in dynamic scenes, a cross-modal location cue fusion module is proposed to integrate search regions from different modalities, providing more precise localization cues. Finally, extensive experimental evaluations on two RGB-T UAV object tracking benchmark datasets (i.e., VTUAV and HiAL) as well as the LasHeR dataset are conducted. The results demonstrate that the proposed method significantly outperforms existing methods. Notably, compared to HMFT, the proposed method improves tracking success rate and precision on the VTUAV dataset by 9.9% and 9.0%, respectively.

Key words: UAV, object tracking, Transformer, cross-modal feature interaction, feature-cooperative reconstruction, cross-modal location cue fusion

CLC Number: