ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Sparse point matching-based collaborative category-agnostic object tracking method
Received date: 2025-06-17
Revised date: 2025-07-22
Accepted date: 2025-09-17
Online published: 2025-09-24
Supported by
Open Foundation of Shaanxi Key Laboratory of Integrated and Intelligent Navigation(SXKLIIN202401003)
Real-time perception and continuous tracking of unknown objects are critical for autonomous intelligent systems. However, the absence of prior category knowledge and limited training samples make the perception and tracking of unknown targets highly challenging. To address this issue. we propose a category-agnostic object tracking method based on the Segment Anything Model (SAM) and sparse feature point matching. The approach first guides SAM to segment unknown objects using prompt points, then extracts sparse keypoints via a network-based feature extraction model, and matches them across frames using an attention-based network to propagate object information. An Iterative SAM with Point Consensus (ISPC) is introduced to maintain segmentation and achieve stable tracking over time. The lightweight target descriptors based on sparse feature points can be efficiently shared among multiple agents, enabling the construction of a collaborative target tracking system. Experiments on the DAVIS 2017 dateset and a self-constructed near-infrared video dataset demonstrate strong robustness and accuracy in collaborative perception and tracking of unknown-category objects.
Rongling LANG , Cailun WEI , Ya FAN , Fei GAO . Sparse point matching-based collaborative category-agnostic object tracking method[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2026 , 47(3) : 632425 -632425 . DOI: 10.7527/S1000-6893.2025.32425
| [1] | CAELLES S, MANINIS K K, PONT-TUSET J, et al. One-shot video object segmentation[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 5320-5329. |
| [2] | WU J F, JIANG Y, BAI S, et al. SeqFormer: Sequential transformer forVideo instance segmentation[C]∥Computer Vision-ECCV 2022. Cham: Springer, 2022: 553-569. |
| [3] | CHENG H K, OH S W, PRICE B, et al. Putting the object back into video object segmentation[C]∥2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2024: 3151-3161. |
| [4] | WANG X L, WANG W, CAO Y, et al. Images speak in images: A generalist painter for in-context visual learning[C]∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2023: 6830-6839. |
| [5] | WANG X, ZHANG X, CAO Y, et al. Seggpt: Segmenting everything in context[DB/OL]. arXiv preprint:2304.03284, 2023. |
| [6] | JABRI A, OWENS A, EFROS A. Space-time correspondence as a contrastive random walk[J]. Advances in neural information processing systems, 2020, 33: 19545-19560. |
| [7] | CARON M, TOUVRON H, MISRA I, et al. Emerging properties in self-supervised vision transformers[C]∥2021 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2021: 9630-9640. |
| [8] | KIRILLOV A, MINTUN E, RAVI N, et al. Segment anything[C]∥2023 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2023: 3992-4003. |
| [9] | RAVI N, GABEUR V, HU Y T, et al. Sam 2: Segment anything in images and videos[DB/OL]. arXiv preprint: 2408.00714, 2024. |
| [10] | YANG J, GAO M, Li Z, et al. Track anything: Segment anything meets videos[DB/OL]. arXiv preprint:2304.11968, 2023. |
| [11] | CHENG Y, LI L, XU Y, et al. Segment and track anything[DB/OL]. arXiv preprint: 2305.06558, 2023. |
| [12] | CHENG H K, SCHWING A G. XMem: Long-term video object segmentation with an Atkinson-shiffrin memory model[C]∥Computer Vision-ECCV 2022. Cham: Springer, 2022: 640-658. |
| [13] | Decoupling features in hierarchical propagation for video object segmentation[C]∥Proceedings of the 36th International Conference on Neural Information Processing Systems. New York: ACM, 2022: 36324-36336. |
| [14] | ZHONG S, LI G Q, YING W H, et al. Efficient semisupervised object segmentation for long-term videos using adaptive memory network[J]. IEEE Transactions on Cognitive and Developmental Systems, 2024, 16(5): 1789-1802. |
| [15] | RAJI? F, KE L, TAI Y, et al. Segment anything meets point tracking[C]∥2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). Piscataway: IEEE Press, 2025: 9302-9311. |
| [16] | HARLEY A W, FANG Z Y, FRAGKIADAKI K. Particle video revisited: tracking through occlusions using point trajectories[C]∥Computer Vision-ECCV 2022. Cham: Springer, 2022: 59-75. |
| [17] | WEN L Y, LEI Z, CHANG M C, et al. Multi-camera multi-target tracking with space-time-view hyper-graph[J]. International Journal of Computer Vision, 2017, 122(2): 313-333. |
| [18] | XU K, WANG C, CHEN C, et al. AirCode: A robust object encoding method[J]. IEEE Robotics and Automation Letters, 2022, 7(2): 1816-1823. |
| [19] | DETONE D, MALISIEWICZ T, RABINOVICH A. SuperPoint: Self-supervised interest point detection and description[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE Press, 2018. |
| [20] | SARLIN P E, DETONE D, MALISIEWICZ T, et al. SuperGlue: Learning feature matching with graph neural networks[C]∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 4938-4947. |
| [21] | SARLIN P E, CADENA C, SIEGWART R, et al. From coarse to fine: Robust hierarchical localization at large scale[C]∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2019: 12708-12717. |
| [22] | FISCHLER M A, BOLLES R C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography[M]∥Readings in Computer Vision. Amsterdam: Elsevier, 1987: 726-740. |
| [23] | QUIGLEY M, CONLEY K, GERKEY B, et al. ROS: An open-source robot operating system[C]∥ICRA Workshop on Open Source Software. Piscataway: IEEE Press, 2009. |
| [24] | ESTER M, KRIEGEL H P, SANDER J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise[C]∥KDD-96 Proceedings. Portland: AAAI Press, 1996: 226-231. |
| [25] | PONT-TUSET J, PERAZZI F, CAELLES S, et al. The 2017 davis challenge on video object segmentation[OB/OL]. arXiv preprint: 1704.00675, 2017. |
/
| 〈 |
|
〉 |