基于特征协同重构的RGB-T无人机目标跟踪-干扰环境下的无人机多源感知”专栏

  • 高栋 ,
  • 赖普坚 ,
  • 王世磊 ,
  • 程塨
展开
  • 西北工业大学

收稿日期: 2025-03-25

  修回日期: 2025-06-09

  网络出版日期: 2025-06-13

基金资助

国家自然科学基金;陕西省杰出青年科学基金

RGB-T UAV Object Tracking Based on Feature-cooperative Reconstruction

  • GAO Dong ,
  • LAI Pu-Jian ,
  • WANG Shi-Lei ,
  • CHENG Gong
Expand

Received date: 2025-03-25

  Revised date: 2025-06-09

  Online published: 2025-06-13

摘要

RGB-T无人机目标跟踪通过融合可见光(RGB)和热红外(TIR)模态的互补信息,以提升复杂环境下的目标跟踪鲁棒性。然而,现有的方法忽略了模态差异引起的噪声干扰,导致模态间特征互补的有效性受损,特征表征能力下降,从而制约了RGB-T无人机目标跟踪器的性能表现。针对上述问题,提出了一种基于特征协同重构的RGB-T无人机目标跟踪方法,该方法的核心是设计了一种由跨模态交互编码器和特征重构解码器组成的特征协同重构模块。具体地,跨模态交互编码器通过自适应特征交互机制,在提取辅助模态关键互补信息的同时有效抑制跨模态噪声干扰。随后,特征重构解码器利用编码器输出的查询特征引导模态特征重构,在保留模态特定特征的同时引入跨模态互补信息,增强特征表征能力。此外,为了提高动态场景下的目标定位精度,提出了一种跨模态位置线索融合模块,用于融合不同模态的搜索区域,从而提供更准确的位置线索。最后,在VTUAV和HiAL两个RGB-T无人机目标跟踪基准数据集和LasHeR数据集对所提方法进行了全面的实验评估。实验结果表明,所提方法在VTUAV和HiAL数据集上性能显著优于现有方法。特别地,在VTUAV数据集上,相比HMFT,本文方法的跟踪成功率和准确率分别提升了9.9%和9.0%。

本文引用格式

高栋 , 赖普坚 , 王世磊 , 程塨 . 基于特征协同重构的RGB-T无人机目标跟踪-干扰环境下的无人机多源感知”专栏[J]. 航空学报, 0 : 1 -0 . DOI: 10.7527/S1000-6893.2025.32017

Abstract

RGB-T Unmanned Aerial Vehicle (UAV) object tracking enhances tracking robustness in complex environments by fusing comple-mentary information from visible (RGB) and thermal infrared (TIR) modalities. However, existing methods neglect the noise inter-ference caused by modality gaps, which weakens the effectiveness of cross-modal feature complementarity and degrades the power of feature representation, limiting the performance of RGB-T UAV trackers. To address this issue, a feature-cooperative reconstruc-tion-based tracker is proposed. The core of the proposed method is to develop a feature-cooperative reconstruction module, consist-ing of a cross-modal interaction encoder and a feature reconstruction decoder. Specifically, the cross-modal interaction encoder em-ploys an adaptive feature interaction strategy to extract critical complementary information from the auxiliary modality while effec-tively suppressing cross-modal noise interference. The feature reconstruction decoder then utilizes the query features from the encod-er to guide the reconstruction of features, preserving modality-specific information while incorporating cross-modal complementary details, thereby enhancing feature representation. Additionally, to enhance target localization accuracy in dynamic scenes, a cross-modal location cue fusion module is proposed to integrate search regions from different modalities, thereby providing more precise localization cues. Finally, extensive experimental evaluations on two RGB-T UAV object tracking benchmark datasets (i.e., VTUAV and HiAL) as well as the LasHeR dataset are conducted. The results demonstrate that the proposed method significantly outperforms existing methods. Concretely, the proposed method achieves the improvements of 9.9% in success rate and 9.0% in precision, re-spectively, compared to HMFT on the VTUAV dataset.

参考文献

[1]陈琳, 刘允刚.面向无人机的视觉目标跟踪算法:综述与展望[J].信息与控制, 2022, 51(1):23-40
[2]CHEN L, LIU Y G.UAV visual target tracking algorithms:Review and future prospect[J].Information and Control, 2022, 51(1):23-40
[3]褚昭晨, 宋韬, 金忍, 等.基于视觉图像的空对空多无人机目标跟踪[J].航空学报, 2024, 45(14):20-35
[4]CHU Z C, SONG T, JIN R, et al.Vision-based air-to-air multi-UAVs tracking[J].Acta Aeronautica et Astronautica Sinica, 2024, 45(14):20-35
[5]薛远亮, 金国栋, 谭力宁, 等.基于多尺度融合的自适应无人机目标跟踪算法[J].航空学报, 2023, 44(1):209-226
[6]XUE Y L, JIN G D, TAN L N, et al.Adaptive UAV target tracking algorithm based on multi-scale fusion[J].Acta Aeronautica et Astronautica Sinica, 2023, 44(1):209-226
[7]刘贞报, 马博迪, 高红岗, 等.基于形态自适应网络的无人机目标跟踪方法[J].航空学报, 2021, 42(4):487-500
[8]LIU Z B, MA B D, GAO H G, et al.Adaptive morphological network based UAV target tracking algorithm[J].Acta Aeronautica et Astronautica Sinica, 2021, 42(4):487-500
[9]BHAT G, DANELLJAN M, GOOL L V, et al.Learning Discriminative Model Prediction for Tracking[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019: 6182-6191.
[10]DANELLJAN M, BHAT G, KHAN F S, et al.Atom: Accurate tracking by overlap maximization[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 4660-4669.
[11]ZHU J, LAI S, CHEN X, et al.Visual Prompt Multi-Modal Tracking[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 9516-9526.
[12]ZHANG P, ZHAO J, WANG D, et al.Visible-thermal UAV tracking: A large-scale benchmark and new baseline[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 8886-8895.
[13]CAO B, GUO J, ZHU P, et al.Bi-directional adapter for multimodal tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(2): 927-935.
[14]FAN H, YU Z, WANG Q, et al.QueryTrack: Joint-modality Query Fusion Network for RGBT Tracking[J].IEEE Transactions on Image Processing, 2024, 33(1):3187-3199
[15]HUI T, XUN Z, PENG F, et al.Bridging search region interaction with template for rgb-t tracking[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 13630-13639.
[16]WANG Y, SUN F, HUANG W, et al.Channel exchanging networks for multimodal and multitask dense image prediction[J].IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 45(5):5481-5496
[17]DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale [DB/OL]. arXiv prep.[J].rXiv:2010.11929, 2020: 1-22., rint, :-
[18]肖云, 曹丹, 李成龙, 等.基于高空无人机平台的多模态跟踪数据集[J].中国图象图形学报, 2025, 30(02):361-374
[19]XIAO YUN, CAO DAN, LI CHENGLONG, et al.A benchmark dataset for high-altitude UAV multi-modal tracking[J].Journal of Image and Graphics, 2025, 30(02):361-374
[20]LAI P, CHENG G, ZHANG M, et al.NCSiam: Reliable Matching via Neighborhood Consensus for Siamese-Based Object Tracking[J].IEEE Transactions on Image Processing, 2023, 32:6168-6182
[21]XU Y, WANG Z, LI Z, et al.SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines[C]//Proceedings of the AAAI conference on artificial intelligence. 2020, 34(07): 12549-12556.
[22]CHEN Z, ZHONG B, LI G, et al.Siamese box adaptive network for visual tracking[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 6668-6677.
[23]ZHANG T, LIU X, ZHANG Q, et al.SiamCDA: Complementarity-and distractor-aware RGB-T tracking based on Siamese network[J].IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(3):1403-1417
[24]HOU X, XING J, QIAN Y, et al.SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking[C]//Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 26551-26561.
[25]YE B, CHANG H, MA B, et al.Joint feature learning and relation modeling for tracking: A one-streamframework[C]//European conference on computer vision. Cham: Springer Nature Switzerland, 2022: 341-357.
[26]LAW H, DENG J.Cornernet: Detecting objects as paired keypoints[C]//Proceedings of the European conference on computer vision. 2018: 734-750.
[27]REZATOFIGHI H, TSOI N, GWAK J, et al.Generalized intersection over union: A metric and a loss for bounding box regression[C]//Proceedings of the IEEE/CVF conference on computer vision and patternrecognition. 2019: 658-666.
[28]WU Z, ZHENG J, REN X, et al.Single-model and any-modality for video object tracking[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 19156-19166.
[29]LI C, XUE W, JIA Y, et al.LasHeR: A large-scalehigh-diversity benchmark for RGBT tracking[J].IEEE Transactions on Image Processing, 2021, 31:392-404
[30]LOSHCHILOV I, HUTTER F. Decoupled weight decay regularization[DB/OL]. arXiv prep.[J].rXiv:1711.05101, 2017: 1-19., rint, :-
[31]GAO Y, LI C, ZHU Y, et al.Deep adaptive fusionnetwork for high performance RGBT tracking[C]//Proceedings of the IEEE/CVF international conferenceon computer vision workshops. 2019: 1-9.
[32]ZHANG P, WANG D, LU H, et al.Learning adaptive attribute-driven representation for real-time RGB-T tracking[J].International Journal of Computer Vision, 2021, 129:2714-2729
[33]KRISTAN M, MATAS J, LEONARDIS A, et al.The seventh visual object tracking VOT2019 challenge results[C]//Proceedings of the IEEE/CVF international conference on computer vision workshops. 2019: 1-36.
[34]XIAO Y, YANG M, LI C, et al.Attribute-based progressive fusion network for rgbt tracking[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2022, 36(3): 2831-2838.
[35]LIU L, LI C, XIAO Y, et al.RGBT Tracking via Challenge-Based Appearance Disentanglement and Interaction[J].IEEE Transactions on Image Processing, 2024, 33:1753-1767
[36]ZHANG L, DANELLJAN M, GONZALEZ-GARCIAA, et al.Multi-Modal Fusion for End-to-End RGB-TTracking[C]//Proceedings of the IEEE/CVF International conference on computer vision workshops. 2019:1-10.
[37]ZHANG T, GUO H, JIAO Q, et al.Efficient rgb-t tracking via cross-modality distillation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 5404-5413.
[38]LIU Y, ZHOU D, CAO J, et al.Specific and Collaborative Representations Siamese Network for RGBT Tracking[J].IEEE Sensors Journal, 2024, 24(11):18520-18534
文章导航

/