基于孪生网络的轻量型无人机单目标跟踪算法
收稿日期: 2024-07-09
修回日期: 2024-07-24
录用日期: 2024-08-21
网络出版日期: 2024-09-02
基金资助
国家自然科学基金(61871258);宜昌市科技研究与开发项目(A201130225)
A lightweight single object tracking algorithm for UAVs based on Siamese network
Received date: 2024-07-09
Revised date: 2024-07-24
Accepted date: 2024-08-21
Online published: 2024-09-02
Supported by
National Natural Science Foundation of China(61871258);Yichang City Science and Technology Research and Development Program(A201130225)
当前,一些单目标跟踪算法已经取得了领先的性能,但庞大的模型限制了其在无人机这样资源有限平台上的应用。为此,设计了一种基于孪生网络的轻量型无人机单目标跟踪算法,旨在消耗更少资源情况下实现对目标的高效跟踪。首先,基于MobileNetV3设计了轻量化的孪生特征提取骨干网络,在保证不降低特征提取能力的前提下,极大减少网络的计算量和参数量。其次,设计了双重互相关模块,该模块采用像素互相关快速实现模板图像特征和搜索图像特征的相似性计算,同时结合深度互相关补充缺失特征,有效提升特征匹配的精度和鲁棒性。然后,通过堆叠多个深度可分离卷积层设计了轻量化的预测头部,以最小化的资源消耗来获得准确的目标表达。最后,在传统的分类和回归损失基础上,引入分类排名损失,增强网络对目标前景的学习能力,抑制背景干扰,进一步提升跟踪性能。综合实验表明:算法以5.3×105的参数量和1.1×108的浮点数运算量,在无人机视频目标跟踪数据集DTB70、UAV123和UAV20L上分别取得82.1%、81.2%和64.6%的准确率以及63.4、61.8%和49.6%的成功率,达到同类跟踪算法的性能且参数量和计算量大幅降低,并能以100 fps以上的速度运行,满足无人机目标跟踪的实时性需求。
丁奇帅 , 雷帮军 , 吴正平 . 基于孪生网络的轻量型无人机单目标跟踪算法[J]. 航空学报, 2025 , 46(4) : 330925 -330925 . DOI: 10.7527/S1000-6893.2024.30925
Currently, some single object tracking algorithms have achieved leading performance, but their large models limit their application on resource-constrained platforms such as Unmanned Aerial Vehicles (UAVs). This paper designs a lightweight single object tracking algorithm for UAVs based on the Siamese network to achieve efficient tracking with less resource consumption. Firstly, a lightweight backbone network for Siamese feature extraction is designed based on MobileNetV3, significantly reducing the computation and parameter volume of the network without compromising feature extraction capabilities. Secondly, a dual cross-correlation module is designed. The module uses Pointwise Cross-Correlation to quickly calculate the similarity between template image features and search image features and also uses Depthwise Cross-Correlation to supplement missing features, effectively enhancing feature matching accuracy and robustness. Then, a lightweight prediction head is designed by stacking multiple depthwise separable convolution layers, obtaining accurate target representations with minimal performance consumption. Finally, classification ranking loss is introduced to traditional classification and regression losses, enhancing the network’s ability to learn target foreground and suppress background interference, and further improving tracking performance. Comprehensive experiments show that the proposed algorithm achieves 82.1%, 81.2%, and 64.6% precision and 63.4%, 61.8%, and 49.6% success rate on DTB70, UAV123, and UAV20L datasets, respectively, with only 5.3×105 parameters and 1.1×108 floating point operations. It achieves the performance comparable to state-of-the-art tracking algorithms, while having significantly fewer parameters and computational load, and it can run at the speeds above 100 fps, meeting the real-time requirements of UAV object tracking.
Key words: UAV; Siamese network; single object tracking; lightweight; cross-correlation
1 | BERTINETTO L, VALMADRE J, HENRIQUES J F, et al. Fully-convolutional Siamese networks for object tracking[C]∥European Conference on Computer Vision. Cham: Springer, 2016: 850-865. |
2 | LI B, YAN J J, WU W, et al. High performance visual tracking with Siamese region proposal network?[C]?∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8971-8980. |
3 | ZHU Z, WANG Q, LI B, et al. Distractor-aware siamese networks for visual object tracking?[C]?∥Proceedings of the European Conference on Computer Vision (ECCV). Cham: Springer, 2018: 101-117. |
4 | LI B, WU W, WANG Q, et al. Siamrpn++: Evolution of siamese visual tracking with very deep networks[C]?∥Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2019: 4282-4291. |
5 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]∥Proceedings of the IEEE Conference on Computer vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 770-778. |
6 | HU W, WANG Q, ZHANG L, et al. Siammask: A framework for fast online object tracking and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(3): 3072-3089. |
7 | 雷帮军, 丁奇帅, 牟乾西, 等. 基于模板更新和双特征增强的视觉跟踪算法[J/OL]. 北京航空航天大学学报, (2024-02-27)[2024-07-01]. . |
LEI B J, DING Q S, MOU Q X, et al. Visual tracking algorithm based on template updating and dual feature enhancement[J]. Journal of Beijing University of Aeronautics and Astronautics, (2024-02-27 [2024-07-01]. (in Chinese). | |
8 | CHEN Z D, ZHONG B N, LI G R, et al. Siamese box adaptive network for visual tracking?[C]?∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 6667-6676. |
9 | GUO D Y, WANG J, CUI Y, et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking?[C]?∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 6268-6276. |
10 | LIN L, FAN H, ZHANG Z, et al. SwinTrack: A simple and strong baseline for transformer tracking[DB/OL]. arXiv preprint: 2112. 00995, 2021. |
11 | CHEN X, YAN B, ZHU J W, et al. Transformer tracking[C]∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2021: 8122-8131. |
12 | CHEN X, PENG H W, WANG D, et al. SeqTrack: Sequence to sequence learning for visual object tracking[C]∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2023: 14572-14581. |
13 | YAN B, PENG H W, WU K, et al. LightTrack: Finding lightweight neural networks for object tracking via one-shot architecture search[C]∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2021: 15175-15184. |
14 | CAO Z A, HUANG Z Y, PAN L, et al. TCTrack: Temporal contexts for aerial tracking[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 14778-14788. |
15 | CAO Z A, HUANG Z Y, PAN L, et al. Towards real-world visual tracking with temporal contexts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(12): 15834-15849. |
16 | FU C H, CAO Z A, LI Y M, et al. Onboard real-time aerial tracking with efficient Siamese anchor proposal network[J]. IEEE Transactions on Geoscience and Remote Sensing, 2022, 60: 5606913. |
17 | CAO Z A, FU C H, YE J J, et al. SiamAPN: Siamese attentional aggregation network for real-time UAV tracking[C]∥2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE Press, 2021: 3086-3092. |
18 | XING D T, EVANGELIOU N, TSOUKALAS A, et al. Siamese transformer pyramid networks for real-time UAV tracking[C]∥2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE Press, 2022: 1898-1907. |
19 | 刘芳, 孙亚楠. 基于自适应融合网络的无人机目标跟踪算法[J]. 航空学报, 2022, 43(7): 359-369. |
LIU F, SUN Y N. UAV target tracking algorithm based on adaptive fusion network[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(7): 359-369 (in Chinese). | |
20 | 徐心宇, 陈建. 无人机状态检测Kalman滤波空地目标跟踪算法[J]. 航空学报, 2024, 45(16): 329834 |
XU X Y, CHEN J. UAV object tracking for air-ground targets based on status detection and Kalman filter[J].Acta Aeronautica et Astronautica Sinica, 2024, 45(16): 329834 (in Chinese). | |
21 | 薛远亮, 金国栋, 谭力宁, 等. 基于多尺度融合的自适应无人机目标跟踪算法[J]. 航空学报, 2023, 44(1): 326107 . |
XUE Y L, JIN G D, TAN L N, et al. Adaptive UAV target tracking algorithm based on multi-scale fusion[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(1): 326107 (in Chinese). | |
22 | HOWARD A, SANDLER M, CHEN B, et al. Searching for MobileNetV3?[C]?∥2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2019: 1314-1324. |
23 | TANG F, LING Q. Ranking-based Siamese visual tracking[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 8731-8740. |
24 | LI S Y, YEUNG D Y. Visual object tracking for unmanned aerial vehicles: A benchmark and new motion models[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2017, 31(1): 4140-4146. |
25 | MUELLER M, SMITH N, GHANEM B. A benchmark and simulator for UAV tracking[M]∥Computer Vision‐ECCV 2016. Cham: Springer, 2016: 445-461. |
26 | HOWARD A G, ZHU M L, CHEN B, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications[DB/OL]. arXiv preprint: 1704. 04861, 2017. |
27 | SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: inverted residuals and linear bottlenecks[C]?∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4510-4520. |
28 | HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]?∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 7132-7141. |
29 | GUO D Y, SHAO Y Y, CUI Y, et al. Graph attention tracking[C]∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2021: 9538-9547. |
30 | LIAO B Y, WANG C Y, WANG Y Y, et al. PG-net: Pixel to global matching network for visual tracking[C]∥European Conference on Computer Vision. Cham: Springer, 2020: 429-444. |
31 | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: Common objects in context[M]∥Lecture Notes in Computer Science. Cham: Springer International Publishing, 2014: 740-755. |
32 | REAL E, SHLENS J, MAZZOCCHI S, et al. Youtube-boundingboxes: A large high-precision human-annotated data set for object detection in video[C]∥Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 5296-5305. |
33 | RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115: 211-252. |
34 | HUANG L H, ZHAO X, HUANG K Q. GOT-10k: A large high-diversity benchmark for generic object tracking in the wild[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(5): 1562-1577. |
35 | ZHANG Z, PENG H, FU J, et al. Ocean: Object-aware anchor-free tracking?[C]?∥Computer Vision?-?ECCV 2020. Cham: Springer, 2020: 771-787. |
36 | ZHANG Z P, PENG H W. Deeper and wider Siamese networks for real-time visual tracking[C]?∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 4586-4595. |
37 | GUO Q, FENG W, ZHOU C, et al. Learning dynamic Siamese network for visual object tracking?[C]?∥2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2017: 1781-1789. |
38 | VALMADRE J, BERTINETTO L, HENRIQUES J, et al. End-to-end representation learning for correlation filter based tracking?[C]?∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 5000-5008. |
39 | DANELLJAN M, BHAT G, KHAN F S, et al. ECO: Efficient convolution operators for tracking?[C]?∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 6931-6939. |
40 | LI X, MA C, WU B Y, et al. Target-aware deep tracking?[C]?∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 1369-1378. |
/
〈 |
|
〉 |