基于稀疏点匹配的协同式未知目标跟踪方法

doi:10.7527/S1000-6893.2025.32425

目标状态协同与智能感知专栏

本期目录 | 过刊浏览 | 高级检索

前一篇 |

基于稀疏点匹配的协同式未知目标跟踪方法

郎荣玲, 魏才伦, 范亚(), 高飞

北京航空航天大学电子信息工程学院，北京 100191

收稿日期:2025-06-17 修回日期:2025-07-22 接受日期:2025-09-17 出版日期:2025-09-25 发布日期:2025-09-24
通讯作者: 范亚 E-mail:fanya1502@buaa.edu.cn
基金资助:
陕西省组合与智能导航重点实验室开放基金资助项目(SXKLIIN202401003)

Sparse point matching-based collaborative category-agnostic object tracking method

Rongling LANG, Cailun WEI, Ya FAN(), Fei GAO

School of Electronic Information Engineering，Beihang University，Beijing 100191，China

Received:2025-06-17 Revised:2025-07-22 Accepted:2025-09-17 Online:2025-09-25 Published:2025-09-24
Contact: Ya FAN E-mail:fanya1502@buaa.edu.cn
Supported by:
Open Foundation of Shaanxi Key Laboratory of Integrated and Intelligent Navigation(SXKLIIN202401003)

摘要/Abstract

摘要：

对未知目标的实时感知与持续跟踪是智能系统自主决策的重要前提，在实际应用中存在缺乏目标类别先验信息和训练样本匮乏的问题，使得未知目标的感知与跟踪更具挑战性。针对此问题，提出了一种基于任意分割模型（SAM）与稀疏特征点匹配的未知目标跟踪方法。该方法首先通过提示点引导SAM模型感知并分割图像中的未知目标，随后利用基于卷积神经网络的特征点提取模型，获取目标图像的稀疏特征点作为目标信息，并通过基于注意力机制的匹配网络在后续帧中匹配这些特征点，完成目标信息传播。在此基础上，设计了一个基于特征点一致性的迭代式SAM模块（ISPC），利用匹配的特征点持续引导SAM模型对后续图像帧的目标进行分割，从而实现未知目标的稳定跟踪。此外基于稀疏特征点的轻量化目标信息，可以在多智能体之间高效共享，构建了一个协同式目标跟踪系统。在DAVIS 2017数据集和自构建的近红外视频数据集上，评估了系统的目标跟踪性能与零训练样本目标的泛化能力。实验结果表明，该方法在处理未知类别目标的协同感知与跟踪任务中，表现出良好的鲁棒性和准确性。

关键词: 目标跟踪, 目标分割, 特征提取, 特征匹配, 协同感知

Abstract:

Real-time perception and continuous tracking of unknown objects are critical for autonomous intelligent systems. However， the absence of prior category knowledge and limited training samples make the perception and tracking of unknown targets highly challenging. To address this issue. we propose a category-agnostic object tracking method based on the Segment Anything Model （SAM） and sparse feature point matching. The approach first guides SAM to segment unknown objects using prompt points， then extracts sparse keypoints via a network-based feature extraction model， and matches them across frames using an attention-based network to propagate object information. An Iterative SAM with Point Consensus （ISPC） is introduced to maintain segmentation and achieve stable tracking over time. The lightweight target descriptors based on sparse feature points can be efficiently shared among multiple agents， enabling the construction of a collaborative target tracking system. Experiments on the DAVIS 2017 dateset and a self-constructed near-infrared video dataset demonstrate strong robustness and accuracy in collaborative perception and tracking of unknown-category objects.

Key words: object tracking, object segmentation, feature extraction, feature matching, collaborative perception

中图分类号:

V243

郎荣玲, 魏才伦, 范亚, 高飞. 基于稀疏点匹配的协同式未知目标跟踪方法[J]. 航空学报, 2026, 47(3): 632425.

Rongling LANG, Cailun WEI, Ya FAN, Fei GAO. Sparse point matching-based collaborative category-agnostic object tracking method[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(3): 632425.

图/表 10

图 1

表1

DAVIS 2017验证集目标跟踪分割方法定量对比结果

方法	$J & F$	$J$	$F$
Painter	34.6	28.5	40.8
DINO	71.4	67.9	76.9
SegGPT	75.6	72.5	78.6
SAM-PT	76.6	74.4	78.9
本文	75.8	73.2	78.4

表1

表2

DAVIS 2017验证集中固定监控或稳定平台拍摄数据片段的定量对比结果

数据片段	本文		SAM-PT		SegGPT
数据片段	$J$	$F$	$J$	$F$	$J$	$F$
blackswan	95.4	97.3	95.2	97.1	93.4	98.2
camel	96.8	99.0	97.7	99.1	80.1	86.9
cows	94.8	96.0	95.3	95.0	90.4	93.0
dogs-jump	87.8	85.6	92.4	96.6	78.1	77.8
drift-chicane	94.2	98.4	93.1	98.4	87.1	97.5
drift-straight	95.6	96.8	87.5	88.1	92.4	95.4
gold-fish	86.4	89.3	88.7	89.9	75.5	72.9
judo	83.0	85.4	78.6	82.0	76.5	80.6
loading	93.1	96.2	80.0	82.1	87.2	89.7
mbike-trick	87.3	89.8	81.7	83.7	80.3	77.6
pigs	87.9	86.8	82.4	81.4	83.0	81.6
平均	91.1	92.7	88.4	90.3	84.0	86.4

表2

图 2

图 3

图 4

图 5

表3

图 6

图7

参考文献 25

[1]	CAELLES S， MANINIS K K， PONT-TUSET J， et al. One-shot video object segmentation［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 5320-5329.
[2]	WU J F， JIANG Y， BAI S， et al. SeqFormer： Sequential transformer forVideo instance segmentation［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 553-569.
[3]	CHENG H K， OH S W， PRICE B， et al. Putting the object back into video object segmentation［C］∥2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2024： 3151-3161.
[4]	WANG X L， WANG W， CAO Y， et al. Images speak in images： A generalist painter for in-context visual learning［C］∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2023： 6830-6839.
[5]	WANG X， ZHANG X， CAO Y， et al. Seggpt： Segmenting everything in context［DB/OL］. arXiv preprint：2304.03284， 2023.
[6]	JABRI A， OWENS A， EFROS A. Space-time correspondence as a contrastive random walk［J］. Advances in neural information processing systems， 2020， 33： 19545-19560.
[7]	CARON M， TOUVRON H， MISRA I， et al. Emerging properties in self-supervised vision transformers［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 9630-9640.
[8]	KIRILLOV A， MINTUN E， RAVI N， et al. Segment anything［C］∥2023 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2023： 3992-4003.
[9]	RAVI N， GABEUR V， HU Y T， et al. Sam 2： Segment anything in images and videos［DB/OL］. arXiv preprint： 2408.00714， 2024.
[10]	YANG J， GAO M， Li Z， et al. Track anything： Segment anything meets videos［DB/OL］. arXiv preprint：2304.11968， 2023.
[11]	CHENG Y， LI L， XU Y， et al. Segment and track anything［DB/OL］. arXiv preprint： 2305.06558， 2023.
[12]	CHENG H K， SCHWING A G. XMem： Long-term video object segmentation with an Atkinson-shiffrin memory model［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 640-658.
[13]	Decoupling features in hierarchical propagation for video object segmentation［C］∥Proceedings of the 36th International Conference on Neural Information Processing Systems. New York： ACM， 2022： 36324-36336.
[14]	ZHONG S， LI G Q， YING W H， et al. Efficient semisupervised object segmentation for long-term videos using adaptive memory network［J］. IEEE Transactions on Cognitive and Developmental Systems， 2024， 16（5）： 1789-1802.
[15]	RAJIČ F， KE L， TAI Y， et al. Segment anything meets point tracking［C］∥2025 IEEE/CVF Winter Conference on Applications of Computer Vision （WACV）. Piscataway： IEEE Press， 2025： 9302-9311.
[16]	HARLEY A W， FANG Z Y， FRAGKIADAKI K. Particle video revisited： tracking through occlusions using point trajectories［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 59-75.
[17]	WEN L Y， LEI Z， CHANG M C， et al. Multi-camera multi-target tracking with space-time-view hyper-graph［J］. International Journal of Computer Vision， 2017， 122（2）： 313-333.
[18]	XU K， WANG C， CHEN C， et al. AirCode： A robust object encoding method［J］. IEEE Robotics and Automation Letters， 2022， 7（2）： 1816-1823.
[19]	DETONE D， MALISIEWICZ T， RABINOVICH A. SuperPoint： Self-supervised interest point detection and description［C］∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2018.
[20]	SARLIN P E， DETONE D， MALISIEWICZ T， et al. SuperGlue： Learning feature matching with graph neural networks［C］∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2020： 4938-4947.
[21]	SARLIN P E， CADENA C， SIEGWART R， et al. From coarse to fine： Robust hierarchical localization at large scale［C］∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2019： 12708-12717.
[22]	FISCHLER M A， BOLLES R C. Random sample consensus： A paradigm for model fitting with applications to image analysis and automated cartography［M］∥Readings in Computer Vision. Amsterdam： Elsevier， 1987： 726-740.
[23]	QUIGLEY M， CONLEY K， GERKEY B， et al. ROS： An open-source robot operating system［C］∥ICRA Workshop on Open Source Software. Piscataway： IEEE Press， 2009.
[24]	ESTER M， KRIEGEL H P， SANDER J， et al. A density-based algorithm for discovering clusters in large spatial databases with noise［C］∥KDD-96 Proceedings. Portland： AAAI Press， 1996： 226-231.
[25]	PONT-TUSET J， PERAZZI F， CAELLES S， et al. The 2017 davis challenge on video object segmentation［OB/OL］. arXiv preprint： 1704.00675， 2017.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

基于稀疏点匹配的协同式未知目标跟踪方法

Sparse point matching-based collaborative category-agnostic object tracking method

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 10

参考文献 25

相关文章 15

编辑推荐

Metrics

本文评价

[1]	马恩淳, 白向龙, 潘泉, 王增福. 分类信息辅助的自适应检测与跟踪方法[J]. 航空学报, 2026, 47(3): 631553-631553.
[2]	丁奇帅, 雷帮军, 吴正平. 基于孪生网络的轻量型无人机单目标跟踪算法[J]. 航空学报, 2025, 46(4): 330925-330925.
[3]	董希旺, 于江龙, 化永朝, 吕金虎, 任章. 集群系统智能协同IOODA技术体系架构与关键技术[J]. 航空学报, 2025, 46(4): 30911-030911.
[4]	唐彬, 杨小冈, 卢瑞涛, 张震宇, 宿爽. 基于图像翻译的飞行器红外/卫星异源快速匹配定位方法[J]. 航空学报, 2025, 46(23): 631961-631961.
[5]	高栋, 赖普坚, 王世磊, 程塨. 基于特征协同重构的RGB-T无人机目标跟踪[J]. 航空学报, 2025, 46(23): 632017-632017.
[6]	刘奇, 裴智翔, 惠乐, 何明一, 戴玉超. 基于双分支特征聚合的无人机视觉位置识别[J]. 航空学报, 2025, 46(23): 632457-632457.
[7]	赵辰豪, 吴德伟, 何晶, 吴倩. 一种无人机视觉位姿估计的语义特征匹配算法[J]. 航空学报, 2025, 46(2): 330406-330406.
[8]	鹿瑶, 李子豪, 刘准钆, 杨衍波. 基于Transformer的异类目标智能关联跟踪[J]. 航空学报, 2025, 46(17): 331643-331643.
[9]	赵江, 皮明豪, 田栢苓, 池沛, 王英勋. 面向多目标跟踪的集群无人机自组织共识决策方法[J]. 航空学报, 2025, 46(16): 331635-331635.
[10]	刘芳, 卢晨阳, 路言, 王鑫. 基于自适应模板更新的Transformer无人机目标跟踪算法[J]. 航空学报, 2025, 46(16): 331687-331687.
[11]	屈若锟, 王致远, 刘晔璐, 李诚龙, 江波. 面向城市空中交通的无人机视觉定位技术[J]. 航空学报, 2025, 46(11): 531168-531168.
[12]	王传云, 苏阳, 王琳霖, 王田, 王静静, 高骞. 面向反制无人机集群的多目标连续鲁棒跟踪算法[J]. 航空学报, 2024, 45(7): 329017-329017.
[13]	金栢成, 田阔, 黄蕾. 曲面加筋结构拓扑优化结果参数化重构方法[J]. 航空学报, 2024, 45(24): 630586-630586.
[14]	王健, 周立辉, 陈家福, 李欣琦, 郭霖佯, 何自豪, 周浩. 基于交互多模型的时变平滑变结构滤波算法[J]. 航空学报, 2024, 45(21): 330167-330167.
[15]	於志文, 孙卓, 程岳, 郭斌. 智能无人机集群协同感知计算研究综述[J]. 航空学报, 2024, 45(20): 630912-630912.