special column

RGB-T UAV object tracking based on feature-cooperative reconstruction

  • Dong GAO ,
  • Pujian LAI ,
  • Shilei WANG ,
  • Gong CHENG
Expand
  • School of Automation,Northwestern Polytechnical University,Xi’an 710072,China
E-mail: gcheng@nwpu.edu.cn

Received date: 2025-03-25

  Revised date: 2025-04-16

  Accepted date: 2025-05-30

  Online published: 2025-06-13

Supported by

National Natural Science Foundation of China(61772425);Shaanxi Province Natural Science Foundation(2021JC-16)

Abstract

RGB-T Unmanned Aerial Vehicle (UAV) object tracking enhances tracking robustness in complex environments by fusing complementary information from visible light and thermal infrared modalities. However, existing methods neglect the noise interference caused by modality gaps, which weakens the effectiveness of cross-modal feature complementarity and degrades the power of feature representation; thereby, limiting the performance of RGB-T UAV trackers. To address this issue, a feature-cooperative reconstruction-based tracker is proposed, the core of which is to develop a feature-cooperative reconstruction module, consisting of a cross-modal interaction encoder and a feature reconstruction decoder. Specifically, the cross-modal interaction encoder employs an adaptive feature interaction strategy to extract critical complementary information from the auxiliary modality while effectively suppressing cross-modal noise interference. The feature reconstruction decoder then utilizes the query features from the encoder to guide the reconstruction of features, preserving modality-specific information while incorporating cross-modal complementary details to enhance feature representation. Additionally, to improve target localization accuracy in dynamic scenes, a cross-modal location cue fusion module is proposed to integrate search regions from different modalities, providing more precise localization cues. Finally, extensive experimental evaluations on two RGB-T UAV object tracking benchmark datasets (i.e., VTUAV and HiAL) as well as the LasHeR dataset are conducted. The results demonstrate that the proposed method significantly outperforms existing methods. Notably, compared to HMFT, the proposed method improves tracking success rate and precision on the VTUAV dataset by 9.9% and 9.0%, respectively.

Cite this article

Dong GAO , Pujian LAI , Shilei WANG , Gong CHENG . RGB-T UAV object tracking based on feature-cooperative reconstruction[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2025 , 46(23) : 632017 -632017 . DOI: 10.7527/S1000-6893.2025.32017

References

[1] 陈琳, 刘允刚. 面向无人机的视觉目标跟踪算法: 综述与展望[J]. 信息与控制202251(1): 23-40.
  CHEN L, LIU Y G. UAV visual target tracking algorithms: Review and future prospect[J]. Information and Control202251(1): 23-40 (in Chinese).
[2] 褚昭晨, 宋韬, 金忍, 等. 基于视觉图像的空对空多无人机目标跟踪[J]. 航空学报202445(14): 629379.
  CHU Z C, SONG T, JIN R, et al. Vision-based air-to-air multi-UAVs tracking[J]. Acta Aeronautica et Astronautica Sinica202445(14): 629379 (in Chinese).
[3] 薛远亮, 金国栋, 谭力宁, 等. 基于多尺度融合的自适应无人机目标跟踪算法[J]. 航空学报202344(1): 326107.
  XUE Y L, JIN G D, TAN L N, et al. Adaptive UAV target tracking algorithm based on multi-scale fusion[J]. Acta Aeronautica et Astronautica Sinica202344(1): 326107 (in Chinese).
[4] 刘贞报, 马博迪, 高红岗, 等. 基于形态自适应网络的无人机目标跟踪方法[J]. 航空学报202142(4): 524904.
  LIU Z B, MA B D, GAO H G, et al. Adaptive morphological network based UAV target tracking algorithm[J]. Acta Aeronautica et Astronautica Sinica202142(4): 524904 (in Chinese).
[5] BHAT G, DANELLJAN M, VAN GOOL L, et al. Learning discriminative model prediction for tracking[C]∥2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2019: 6181-6190.
[6] DANELLJAN M, BHAT G, KHAN F S, et al. ATOM: Accurate tracking by overlap maximization[C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 4655-4664.
[7] ZHU J W, LAI S M, CHEN X, et al. Visual prompt multi-modal tracking[C]∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2023: 9516-9526.
[8] ZHANG P Y, ZHAO J, WANG D, et al. Visible-thermal UAV tracking: A large-scale benchmark and new baseline[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 8876-8885.
[9] CAO B, GUO J L, ZHU P F, et al. Bi-directional adapter for multimodal tracking[J]. Proceedings of the AAAI Conference on Artificial Intelligence202438(2): 927-935.
[10] CUI Y T, JIANG C, WANG L M, et al. MixFormer: End-to-end tracking with iterative mixed attention[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022.
[11] HUI T R, XUN Z Z, PENG F G, et al. Bridging search region interaction with template for RGB-T tracking[C]∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2023.
[12] WANG Y K, SUN F C, HUANG W B, et al. Channel exchanging networks for multimodal and multitask dense image prediction[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence202345(5): 5481-5496.
[13] DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16x16 words: Transformers for image recognition at scale[DB/OL]. arXiv preprint2010.11929, 2020.
[14] 肖云, 曹丹, 李成龙, 等. 基于高空无人机平台的多模态跟踪数据集[J]. 中国图象图形学报202530(2): 361-374.
  XIAO Y, CAO D, LI C L, et al. A benchmark dataset for high-altitude UAV multi-modal tracking[J]. Journal of Image and Graphics202530(2): 361-374 (in Chinese).
[15] LAI P J, CHENG G, ZHANG M L, et al. NCSiam: Reliable matching via neighborhood consensus for Siamese-based object tracking[J]. IEEE Transactions on Image Processing202332: 6168-6182.
[16] XU Y D, WANG Z Y, LI Z X, et al. SiamFC++: towards robust and accurate visual tracking with target estimation guidelines[J]. Proceedings of the AAAI Conference on Artificial Intelligence202034(7): 12549-12556.
[17] CHEN Z D, ZHONG B N, LI G R, et al. Siamese box adaptive network for visual tracking[C]∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020.
[18] ZHANG T L, LIU X R, ZHANG Q, et al. SiamCDA: Complementarity-and distractor-aware RGB-T tracking based on Siamese network[J]. IEEE Transactions on Circuits and Systems for Video Technology202232(3): 1403-1417.
[19] HOU X J, XING J Z, QIAN Y J, et al. SDSTrack: Self-distillation symmetric adapter learning for multi-modal visual object tracking[C]∥2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2024.
[20] YE B T, CHANG H, MA B P, et al. Joint feature learning andRelation modeling forTracking: A one-stream framework[C]∥Computer Vision-ECCV 2022. Cham: Springer, 2022: 341-357.
[21] LAW H, DENG J. CornerNet: Detecting objects as paired keypoints[C]∥Computer Vision-ECCV 2018. Cham: Springer, 2018: 765-781.
[22] REZATOFIGHI H, TSOI N, GWAK J, et al. Generalized intersection over union: A metric and a loss for bounding box regression[C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 658-666.
[23] MULLER M, BIBI A, GIANCOLA S, et al. Trackingnet: A large-scale dataset and benchmark for object tracking in the wild[C]∥Proceedings of the European conference on computer vision (ECCV). 2018: 300-317.
[24] LI C L, XUE W L, JIA Y Q, et al. LasHeR: A large-scale high-diversity benchmark for RGBT tracking[J]. IEEE Transactions on Image Processing202131: 392-404.
[25] LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[DB/OL]. arXiv preprint: 1711.05101, 2017.
[26] GAO Y, LI C, ZHU Y, et al. Deep adaptive fusion nework for high performance RGBT tracking[C]∥Proceedings of the IEEE/CVF international conference on computer vision workshops. Piscataway: IEEE Press, 2019.
[27] ZHANG P Y, WANG D, LU H C, et al. Learning adaptive attribute-driven representation for real-time RGB-T tracking[J]. International Journal of Computer Vision2021129(9): 2714-2729.
[28] KRISTAN M, MATAS J, LEONARDIS A, et al. The seventh visual object tracking VOT2019 challenge results[C]∥2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Piscataway: IEEE Press, 2019.
[29] ZHANG L C, DANELLJAN M, GONZALEZ-GARCIA A, et al. Multi-modal fusion for end-to-end RGB-T tracking[C]∥2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Piscataway: IEEE Press, 2019.
[30] FAN H J, YU Z C, WANG Q, et al. QueryTrack: Joint-modality query fusion network for RGBT tracking[J]. IEEE Transactions on Image Processing202433: 3187-3199.
[31] XIAO Y, YANG M M, LI C L, et al. Attribute-based progressive fusion network for RGBT tracking[J]. Proceedings of the AAAI Conference on Artificial Intelligence202236(3): 2831-2838.
[32] LIU L, LI C L, XIAO Y, et al. RGBT tracking via challenge-based appearance disentanglement and interaction[J]. IEEE Transactions on Image Processing202433: 1753-1767.
[33] ZHANG T L, GUO H Y, JIAO Q, et al. Efficient RGB-T tracking via cross-modality distillation[C]∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2023: 5404-5413.
[34] LIU Y S, ZHOU D M, CAO J D, et al. Specific and collaborative representations Siamese network for RGBT tracking[J]. IEEE Sensors Journal202424(11): 18520-18534.
[35] WU Z W, ZHENG J L, REN X X, et al. Single-model and any-modality for video object tracking[C]∥2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2024: 19156-19166.
Outlines

/