基于稀疏点匹配的协同式未知目标跟踪方法

郎荣玲; 魏才伦; 范亚; 高飞

doi:10.7527/S1000-6893.2025.32425

航空学报 >

2026 , Vol. 47 >Issue 3: 632425 - 632425

DOI: https://doi.org/10.7527/S1000-6893.2025.32425

目标状态协同与智能感知专栏

基于稀疏点匹配的协同式未知目标跟踪方法

郎荣玲 ,
魏才伦 ,
范亚 ,
高飞

展开

北京航空航天大学电子信息工程学院，北京 100191

．E-mail： fanya1502@buaa.edu.cn

收稿日期: 2025-06-17

修回日期: 2025-07-22

录用日期: 2025-09-17

网络出版日期: 2025-09-24

基金资助

陕西省组合与智能导航重点实验室开放基金资助项目(SXKLIIN202401003)

收起

Sparse point matching-based collaborative category-agnostic object tracking method

Rongling LANG ,
Cailun WEI ,
Ya FAN ,
Fei GAO

Expand

School of Electronic Information Engineering，Beihang University，Beijing 100191，China

E-mail：fanya1502@buaa.edu.cn

Received date: 2025-06-17

Revised date: 2025-07-22

Accepted date: 2025-09-17

Online published: 2025-09-24

Supported by

Open Foundation of Shaanxi Key Laboratory of Integrated and Intelligent Navigation(SXKLIIN202401003)

Fold

摘要

对未知目标的实时感知与持续跟踪是智能系统自主决策的重要前提，在实际应用中存在缺乏目标类别先验信息和训练样本匮乏的问题，使得未知目标的感知与跟踪更具挑战性。针对此问题，提出了一种基于任意分割模型（SAM）与稀疏特征点匹配的未知目标跟踪方法。该方法首先通过提示点引导SAM模型感知并分割图像中的未知目标，随后利用基于卷积神经网络的特征点提取模型，获取目标图像的稀疏特征点作为目标信息，并通过基于注意力机制的匹配网络在后续帧中匹配这些特征点，完成目标信息传播。在此基础上，设计了一个基于特征点一致性的迭代式SAM模块（ISPC），利用匹配的特征点持续引导SAM模型对后续图像帧的目标进行分割，从而实现未知目标的稳定跟踪。此外基于稀疏特征点的轻量化目标信息，可以在多智能体之间高效共享，构建了一个协同式目标跟踪系统。在DAVIS 2017数据集和自构建的近红外视频数据集上，评估了系统的目标跟踪性能与零训练样本目标的泛化能力。实验结果表明，该方法在处理未知类别目标的协同感知与跟踪任务中，表现出良好的鲁棒性和准确性。

关键词： 目标跟踪; 目标分割; 特征提取; 特征匹配; 协同感知

本文引用格式

郎荣玲 , 魏才伦 , 范亚 , 高飞 . 基于稀疏点匹配的协同式未知目标跟踪方法[J]. 航空学报, 2026 , 47(3) : 632425 -632425 . DOI: 10.7527/S1000-6893.2025.32425

Abstract

Real-time perception and continuous tracking of unknown objects are critical for autonomous intelligent systems. However， the absence of prior category knowledge and limited training samples make the perception and tracking of unknown targets highly challenging. To address this issue. we propose a category-agnostic object tracking method based on the Segment Anything Model （SAM） and sparse feature point matching. The approach first guides SAM to segment unknown objects using prompt points， then extracts sparse keypoints via a network-based feature extraction model， and matches them across frames using an attention-based network to propagate object information. An Iterative SAM with Point Consensus （ISPC） is introduced to maintain segmentation and achieve stable tracking over time. The lightweight target descriptors based on sparse feature points can be efficiently shared among multiple agents， enabling the construction of a collaborative target tracking system. Experiments on the DAVIS 2017 dateset and a self-constructed near-infrared video dataset demonstrate strong robustness and accuracy in collaborative perception and tracking of unknown-category objects.

Key words： object tracking; object segmentation; feature extraction; feature matching; collaborative perception

参考文献

[1]	CAELLES S， MANINIS K K， PONT-TUSET J， et al. One-shot video object segmentation［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 5320-5329.
[2]	WU J F， JIANG Y， BAI S， et al. SeqFormer： Sequential transformer forVideo instance segmentation［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 553-569.
[3]	CHENG H K， OH S W， PRICE B， et al. Putting the object back into video object segmentation［C］∥2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2024： 3151-3161.
[4]	WANG X L， WANG W， CAO Y， et al. Images speak in images： A generalist painter for in-context visual learning［C］∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2023： 6830-6839.
[5]	WANG X， ZHANG X， CAO Y， et al. Seggpt： Segmenting everything in context［DB/OL］. arXiv preprint：2304.03284， 2023.
[6]	JABRI A， OWENS A， EFROS A. Space-time correspondence as a contrastive random walk［J］. Advances in neural information processing systems， 2020， 33： 19545-19560.
[7]	CARON M， TOUVRON H， MISRA I， et al. Emerging properties in self-supervised vision transformers［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 9630-9640.
[8]	KIRILLOV A， MINTUN E， RAVI N， et al. Segment anything［C］∥2023 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2023： 3992-4003.
[9]	RAVI N， GABEUR V， HU Y T， et al. Sam 2： Segment anything in images and videos［DB/OL］. arXiv preprint： 2408.00714， 2024.
[10]	YANG J， GAO M， Li Z， et al. Track anything： Segment anything meets videos［DB/OL］. arXiv preprint：2304.11968， 2023.
[11]	CHENG Y， LI L， XU Y， et al. Segment and track anything［DB/OL］. arXiv preprint： 2305.06558， 2023.
[12]	CHENG H K， SCHWING A G. XMem： Long-term video object segmentation with an Atkinson-shiffrin memory model［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 640-658.
[13]	Decoupling features in hierarchical propagation for video object segmentation［C］∥Proceedings of the 36th International Conference on Neural Information Processing Systems. New York： ACM， 2022： 36324-36336.
[14]	ZHONG S， LI G Q， YING W H， et al. Efficient semisupervised object segmentation for long-term videos using adaptive memory network［J］. IEEE Transactions on Cognitive and Developmental Systems， 2024， 16（5）： 1789-1802.
[15]	RAJI? F， KE L， TAI Y， et al. Segment anything meets point tracking［C］∥2025 IEEE/CVF Winter Conference on Applications of Computer Vision （WACV）. Piscataway： IEEE Press， 2025： 9302-9311.
[16]	HARLEY A W， FANG Z Y， FRAGKIADAKI K. Particle video revisited： tracking through occlusions using point trajectories［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 59-75.
[17]	WEN L Y， LEI Z， CHANG M C， et al. Multi-camera multi-target tracking with space-time-view hyper-graph［J］. International Journal of Computer Vision， 2017， 122（2）： 313-333.
[18]	XU K， WANG C， CHEN C， et al. AirCode： A robust object encoding method［J］. IEEE Robotics and Automation Letters， 2022， 7（2）： 1816-1823.
[19]	DETONE D， MALISIEWICZ T， RABINOVICH A. SuperPoint： Self-supervised interest point detection and description［C］∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2018.
[20]	SARLIN P E， DETONE D， MALISIEWICZ T， et al. SuperGlue： Learning feature matching with graph neural networks［C］∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2020： 4938-4947.
[21]	SARLIN P E， CADENA C， SIEGWART R， et al. From coarse to fine： Robust hierarchical localization at large scale［C］∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2019： 12708-12717.
[22]	FISCHLER M A， BOLLES R C. Random sample consensus： A paradigm for model fitting with applications to image analysis and automated cartography［M］∥Readings in Computer Vision. Amsterdam： Elsevier， 1987： 726-740.
[23]	QUIGLEY M， CONLEY K， GERKEY B， et al. ROS： An open-source robot operating system［C］∥ICRA Workshop on Open Source Software. Piscataway： IEEE Press， 2009.
[24]	ESTER M， KRIEGEL H P， SANDER J， et al. A density-based algorithm for discovering clusters in large spatial databases with noise［C］∥KDD-96 Proceedings. Portland： AAAI Press， 1996： 226-231.
[25]	PONT-TUSET J， PERAZZI F， CAELLES S， et al. The 2017 davis challenge on video object segmentation［OB/OL］. arXiv preprint： 1704.00675， 2017.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献