无人机视频多目标特征关联技术研究进展

doi:10.7527/S1000-6893.2025.31967

电子电气工程与控制

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

无人机视频多目标特征关联技术研究进展

伍瀚¹^,², 孙浩¹^,², 刘奎¹^,², 计科峰¹^,²(), 匡纲要¹^,²

^1.国防科技大学电子科学学院，长沙 410073
^2.国防科技大学电子信息系统复杂电磁环境效应国家重点实验室，长沙 410073

收稿日期:2025-03-12 修回日期:2025-03-29 接受日期:2025-05-28 出版日期:2025-06-10 发布日期:2025-06-06
通讯作者: 计科峰 E-mail:jikefeng@nudt.edu.cn
基金资助:
国家自然科学基金(61971426)

Multi-object feature association in UAV videos: Recent progress and perspectives

Han WU¹^,², Hao SUN¹^,², Kui LIU¹^,², Kefeng JI¹^,²(), Gangyao KUANG¹^,²

^1.College of Electronic Science and Technology，National University of Defense Technology，Changsha 410073，China
^2.State Key Laboratory of Complex Electromagnetic Environment Effects on Electronics and Information System，National University of Defense Technology，Changsha 410073，China

Received:2025-03-12 Revised:2025-03-29 Accepted:2025-05-28 Online:2025-06-10 Published:2025-06-06
Contact: Kefeng JI E-mail:jikefeng@nudt.edu.cn
Supported by:
National Natural Science Foundation of China(61971426)

摘要/Abstract

摘要：

无人机视频已成为智能监控、智慧城市、态势感知、低空经济以及军事侦察等军民用领域不可或缺的信息来源。无人机视频多目标特征关联旨在持续预测各目标位置并维持其身份标识，是多目标跟踪等任务的核心。目前，相关综述多聚焦于目标检测与跟踪，本研究对无人机视频多目标特征关联技术研究进展进行系统综述。首先，归纳梳理无人机视频多目标特征关联典型研究成果，并根据应用场景和数据源特性对其进行分类，涵盖了多视角和多光谱特征关联相关研究成果。其次，深入分析各类方法的典型算法、优缺点及适用场景。然后，总结整理无人机视频多目标特征关联的主流公开数据集，包括单视频数据集、多视角视频数据集以及多光谱视频数据集，并基于VisDrone、MDMT和VT-Tiny-MOT 3个典型数据集，对现有代表性方法的性能等进行系统对比，分析不同方法之间性能差异的根本原因，为后续研究奠定基础。最后，探讨无人机视频特征关联面临的挑战与未来的研究方向，特别是基础模型构建与多模态深度融合等，以期为无人机视频多目标特征关联技术的深入研究提供参考。

关键词: 无人机视频, 特征关联, 多目标跟踪, 多视角视频, 多光谱视频

Abstract:

Unmanned Aerial Vehicle （UAV） videos have become essential sources of information in both civilian and military domains， including intelligent surveillance， smart cities， situational awareness， low-altitude economy and military reconnaissance. Multi-object feature association in UAV videos aims to continuously predict target positions and maintain the identity of each target， serving as the foundation for tasks such as multi-object tracking. However， existing reviews predominantly focus on UAV object detection and tracking， lacking a systematic review for multi-object feature association in UAV videos. This paper provides the first systematic review of the research progress on multi-object feature association in UAV videos. First， existing methods are summarized and categorized based on application scenarios and data source characteristics， which covers multi-view and multi-spectral feature association approaches for the first time. Then， the representative algorithms are analyzed in depth， including their strengths， limitations， and applicable scenarios. In addition， mainstream public datasets used in this research field are summarized， including single-view， multi-view， and multi-spectral UAV video datasets. Representative datasets such as VisDrone， MDMT， and VT-Tiny-MOT are selected to evaluate and compare existing methods， with the purpose of analyzing the root causes of the performance differences among existing methods and laying the foundation for subsequent studies. Finally， the paper highlights the key challenges that remain in UAV multi-object feature association and discusses future research directions， particularly in the areas of foundation model development and multi-modal deep fusion. This review aims to provide valuable insights for advancing research in this field.

Key words: UAV videos, feature association, multi-object tracking, multi-view videos, multi-spectral videos

中图分类号:

V279

伍瀚, 孙浩, 刘奎, 计科峰, 匡纲要. 无人机视频多目标特征关联技术研究进展[J]. 航空学报, 2026, 47(4): 331967.

Han WU, Hao SUN, Kui LIU, Kefeng JI, Gangyao KUANG. Multi-object feature association in UAV videos: Recent progress and perspectives[J]. Acta Aeronautica et Astronautica Sinica, 2026, 47(4): 331967.

图/表 26

图 1

表1

图 2

图 3

表2

表3

图 4

图 5

图 6

表4

表5

表6

无人机多目标时空特征关联主要算法对比

方法类别	网络	改进措施	优势	局限
基于目标轨迹预测的方法	HMTT^［42］	计算检测框和卡尔曼滤波预测目标框间的IOU实现不同帧的特征关联	方法简单易实现，运算效率高	对非线性运动建模能力弱
	UAVMOT^［67］	利用背景建模分离目标和背景运动，减小无人机运动造成的轨迹预测误差	无人机高速飞行或背景快速变化时表现优异	光照变化和动态背景导致背景建模难
	VS-MM^［74］	将道路信息和交通法规等域知识融入JPDA算法中作为状态约束	能获取更丰富的线索提升关联准确性	忽视实际应用中目标的异常行为
	AMIR^［76］	通过RNN编码目标的运动和交互关系等多个线索的长期时间依赖性	能在目标遮挡时修正预测结果	在密集交互的场景中表现欠佳
	FOLT^［78］	利用光流网络估计相邻帧的运动信息，并基于此聚合一定时间内的特征	提高了运动模糊和遮挡场景下的性能	提取光流信息所需计算开销大
	DroneMOT^［40］	通过Transformer建模目标和背景的运动状态	有效挖掘不同时刻的目标特征一致性	倾向全局建模，忽略目标局部细节
	STDFormer^［82］	通过Transformer同时建模目标的空间位置关系和时间运动轨迹	有效捕捉目标之间的长期依赖关系	预测误差在时间维度上累积严重
基于目标外观信息的方法	SFTrack^［83］	提取目标的颜色直方图特征，并通过巴氏距离度量目标间的相似性	对小目标取得了较好的性能	对光照变化敏感
	SCTrack^［84］	引入一个深度重识别网络对每个目标进行特征提取	学习具有更加具有判别性的特征表示	存在大量重复运算，难以实时应用
	AsyUAV^［88］	以多任务学习的范式将目标检测和重识别集成到一个网络中	避免了大量的重复运算，运算效率高	多个任务间存在优化矛盾
	GCEVT^［90］	全局和局部信息相结合，融合多尺度特征捕获尺度自适应的目标特征	对不同尺度目标的识别能力强	模型计算量大
	MMTrack^［91］	设计特征解耦策略，通过自注意力将共享特征转换为不同任务专属的特征	消除了网络内部优化矛盾	引入了额外的计算开销
	PID-MOT^［42］	聚合不同时刻目标和背景的信息	对遮挡的鲁棒性强	存在特征混淆风险
基于单目标跟踪辅助关联的方法	SOTMOT^［98］	通过单目标跟踪对每个目标进行精细化处理	目标消失后找回目标的能力强	缺乏对其他目标或背景的信息整合
	STAM^［43］	将检测结果作为单目标跟踪的候选区域	可捕捉目标间交互关系	依赖检测结果质量
	SAA^［99］	相邻帧间进行短期建模，同时使用重识别网络构建目标长期的外观。	利用了短期动态信息和长期外观信息的互补性	目标特征稀疏时可能出现漏跟
	DMAN^［101］	采用ECO局部搜索关联失败的目标	定位目标的能力强	系统不稳定性较高
基于联合检测与关联的方法	AirTrack^［103］	将孪生网络整合到多目标关联框架中	实现了检测和特征关联任务的统一	复杂场景下难以挖掘深层次时序关系
	MOFTrack^［105］	密集相似度学习框架，对不同视频帧的候选区域进行对比学习	充分利用了背景和候选区域间的细粒度差异	错误的候选区域可能会引入噪声
	UGT^［45］	将相邻帧中的目标和目标间的相似性建模为图的节点和边，进行图卷积运算	可有效挖掘目标间的时空关系	目标长时依赖关系难以有效建模
	TransCenter^［109］	基于Transformer提出中心点密集表示，以目标中心点密集热图作为目标表示	具有更高的细粒度，能够准确定位目标	密集热图生成和匹配需额外计算开销

表6

图 7

图 8

图 9

表7

图 10

表8

图 11

图 12

表9

无人机多目标多光谱特征关联主要算法对比

方法类别	网络	改进措施	优势	局限性
多光谱特征融合方法	MBNet^［128］	提出感知可见光光照强度的网络，利用照明信息加权融合多光谱特征	在光照变化剧烈的场景中表现出更强的适应性	低光照条件下难以准确估计光照强度
	PIAFusion^［52］	估计光照分布并计算光照概率，利用光照概率构成光照感知损失指导网络训练	减少因光照变化导致的匹配错误	依赖高准确性的光照概率估计
	U2Fusion^［129］	通过图像质量评价模型和信息熵计算特征中的信息丰富程度估计融合权重	能根据图像的实际信息量动态调整权重	依赖全局信息熵，忽略局部特征差异
	RFNet^［131］	采用通道注意力调整多光谱特征的通道贡献，通过绝对梯度表征特征丰富程度	有效捕捉图像中的边缘和纹理等细节信息	复杂背景下绝对梯度可能被噪声影响
	VIF-Net^［132］	通过像素幅度衡量图像信息量，基于此增加融合模型对红外特征的保留程度	增强了融合图像目标区域的对比度和细节表现	高像素幅度区域可能包含非目标区域
多光谱特征匹配方法	HAMNet^［136］	分别提取不同传感器的图像特征，基于特定频谱的特征对齐同一目标表征	能全面地捕捉不同传感器的独特信息	多光谱特征空间差异大，对齐难度高
	CMTR^［53］	提出基于Transformer的多光谱统一表征学习网络，显示地挖掘多光谱公共信息	统一的网络结构可增强对多源数据的理解能力	训练过程需要大量的标注数据
	DSCSN^［138］	将多光谱图像嵌入到三维表征空间，从多光谱图像中提取对比特征	保留了目标空间结构信息，增强模型判别能力	三维表征和特征对比计算资源开销大
	MSCLNet^［140］	将同一目标表征聚类为多个子簇，并通过对比学习区分不同目标所属的聚类簇	适应目标不同视角、姿态、光照的表征多样性	数据噪声或分布不均导致聚类不准确

表9

图 13

表10

表11

表12

表13

参考文献 160

[1]	ZHOU M L， XING R， HAN D L， et al. PDT： UAV target detection dataset for Pests and Diseases tree［C］∥Com puter Vision-ECCV 2024. Cham： Springer， 2025： 56-72.
[2]	KAUFMANN E， BAUERSFELD L， LOQUERCIO A， et al. Champion-level drone racing using deep reinforcement learning［J］. Nature， 2023， 620（7976）： 982-987.
[3]	吴一全，童康. 基于深度学习的无人机航拍图像小目标检测研究进展［J］. 航空学报， 2025， 46（3）： 030848.
	WU Y Q， TONG K. Research advances on deep learning-based small object detection in UAV aerial images［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（3）： 030848 （in Chinese）.
[4]	LIN J L， LUO Z M， LIN D Z， et al. A self-adaptive feature extraction method for aerial-view geo-localization［J］. IEEE Transactions on Image Processing， 2025， 34： 126-139.
[5]	王海峰. 高性能协同作战无人机的发展与思考［J］. 航空学报， 2024， 45（17）： 530304.
	WANG H F. Development of high performance collaborative combat UAVs［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（17）： 530304 （in Chinese）.
[6]	CAO X Y， ZHENG Y Y， YAO Y， et al. TOPIC： A parallel association paradigm for multi-object tracking under complex motions and diverse scenes［J］. IEEE Transactions on Image Processing， 2025， 34： 743-758.
[7]	DING J G， LI W， YANG M， et al. SeaTrack： Rethinking observation-centric SORT for robust nearshore multiple object tracking［J］. Pattern Recognition， 2025， 159： 111091.
[8]	LI Z P， ZHANG D X， WU S， et al. Sampling-resilient multi-object tracking［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2024， 38（4）： 3297-3305.
[9]	FENG M Z， SU J B. RGBT tracking： A comprehensive review［J］. Information Fusion， 2024， 110： 102492.
[10]	何友，刘瑜，李耀文，等. 多源信息融合发展及展望［J］. 航空学报， 2025， 46（6）： 531672.
	HE Y， LIU Y， LI Y W， et al. Development and prospects of multisource information fusion［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（6）： 531672 （in Chinese）.
[11]	WU Z W， ZHENG J L， REN X X， et al. Single-model and any-modality for video object tracking［C］∥2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2024： 19156-19166.
[12]	王传云，苏阳，王琳霖，等. 面向反制无人机集群的多目标连续鲁棒跟踪算法［J］. 航空学报， 2024， 45（7）： 329017.
	WANG C Y， SU Y， WANG L L， et al. Multi-object continuous robust tracking algorithm for anti-UAV swarm［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（7）： 329017 （in Chinese）.
[13]	LUO W H， XING J L， MILAN A， et al. Multiple object tracking： A literature review［J］. Artificial Intelligence， 2021， 293： 103448.
[14]	PAL S K， PRAMANIK A， MAITI J， et al. Deep learning in multi-object detection and tracking： State of the art［J］. Applied Intelligence， 2021， 51（9）： 6400-6429.
[15]	NGUYEN P， QUACH K G， DUONG C N， et al. Multi camera multi-object tracking on the move via single stage global association approach［J］. Pattern Recognition， 2024， 152： 110457.
[16]	LUO R， SONG Z K， MA L T， et al. DiffusionTrack： Diffusion model for multi-object tracking［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2024， 38（5）： 3991-3999.
[17]	ZHANG Y J， LIANG Y Q， LENG J X， et al. SCGTracker： Spatio-temporal correlation and graph neural networks for multiple object tracking［J］. Pattern Recognition， 2024， 149： 110249.
[18]	YUAN X Y， XU T F， LIU X C， et al. Multi-step temporal modeling for UAV tracking［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2024， 34（8）： 7216-7230.
[19]	薛远亮，金国栋，谭力宁，等. 基于多尺度融合的自适应无人机目标跟踪算法［J］. 航空学报， 2023， 44（1）： 326107.
	XUE Y L， JIN G D， TAN L N， et al. Adaptive UAV target tracking algorithm based on multi-scale fusion［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（1）： 326107 （in Chinese）.
[20]	杨永刚，姜文韬，高志云. 低空无人机实时目标检测算法［J］. 航空学报， 2025， 46（16）： 331619.
	YANG Y G， JIANG W T， GAO Z Y. Real-time target detection algorithm for low altitude UAVs［J］. Acta Aeronautica et Astronautica Sinica， 2025， 46（16）： 331619 （in Chinese）.
[21]	YIN N Z， LIU C X， TIAN R H， et al. SDPDet： Learning scale-separated dynamic proposals for end-to-end drone-view detection［J］. IEEE Transactions on Multimedia， 2024， 26： 7812-7822.
[22]	HUANG B， LI J N， CHEN J J， et al. Anti-UAV410： A thermal infrared benchmark and customized scheme for tracking drones in the wild［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2024， 46（5）： 2852-2865.
[23]	YE N Y， ZENG Z Y， ZHOU J D， et al. OoD-control： Generalizing control in unseen environments［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2024， 46（11）： 7421-7433.
[24]	DAI M， ZHENG E H， FENG Z H， et al. Vision-based UAV self-positioning in low-altitude urban environments［J］. IEEE Transactions on Image Processing， 2023， 33： 493-508.
[25]	JIMÉNEZ-BRAVO D M， LOZANO MURCIEGO Á， SALES MENDES A， et al. Multi-object tracking in traffic environments： A systematic literature review［J］. Neurocomputing， 2022， 494： 43-55.
[26]	TANG G Y， NI J J， ZHAO Y H， et al. A survey of object detection for UAVs based on deep learning［J］. Remote Sensing， 2024， 16（1）： 149.
[27]	苑玉彬，吴一全，赵朗月，等. 基于深度学习的无人机航拍视频多目标检测与跟踪研究进展［J］. 航空学报， 2023， 44（18）： 028334.
	YUAN Y B， WU Y Q， ZHAO L Y， et al. Research progress of UAV aerial video multi-object detection and tracking based on deep learning［J］. Acta Aeronautica et Astronautica Sinica， 2023， 44（18）： 028334 （in Chinese）.
[28]	FU C H， LU K H， ZHENG G Z， et al. Siamese object tracking for unmanned aerial vehicle： A review and comprehensive analysis［J］. Artificial Intelligence Review， 2023， 56（1）： 1417-1477.
[29]	SUN N Y， ZHAO J， SHI Q， et al. Moving target tracking by unmanned aerial vehicle： A survey and taxonomy［J］. IEEE Transactions on Industrial Informatics， 2024， 20（5）： 7056-7068.
[30]	WANG J K， WU Z X， CHEN D D， et al. OmniTracker： Unifying visual object tracking by tracking-with-detection［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2025， 47（4）： 3159-3174.
[31]	FUNG A， BENHABIB B， NEJAT G. LDTrack： Dynamic people tracking by service robots using diffusion models［J］. International Journal of Computer Vision， 2025， 133（6）： 3392-3412.
[32]	GAO Y， XU H J， LI J， et al. BPMTrack： Multi-object tracking with detection box application pattern mining［J］. IEEE Transactions on Image Processing， 2024， 33： 1508-1521.
[33]	ZHAO X， HU S Y， WANG Y P， et al. BioDrone： A bionic drone-based single object tracking benchmark for robust vision［J］. International Journal of Computer Vision， 2024， 132（5）： 1659-1684.
[34]	WANG Y， HUANG Z R， LAGANIÈRE R， et al. A UAV to UAV tracking benchmark［J］. Knowledge-Based Systems， 2023， 261： 110197.
[35]	TRAN T M， BUI D C， NGUYEN T V， et al. Transformer based spatio-temporal unsupervised traffic anomaly detection in aerial videos［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2024， 34（9）： 8292-8309.
[36]	WANG J， LI X Q， ZHOU L H， et al. Adaptive receptive field enhancement network based on attention mechanism for detecting the small target in the aerial image［J］. IEEE Transactions on Geoscience and Remote Sensing， 2023， 62： 5600118.
[37]	KOUZEGHAR M， SONG Y， MEGHJANI M， et al. Multi-target pursuit by a decentralized heterogeneous UAV swarm using deep multi-agent reinforcement learning［C］∥2023 IEEE International Conference on Robot ics and Automation （ICRA）. Piscataway： IEEE Press， 2023： 3289-3295.
[38]	KHAN M U， DIL M， ALAM M Z， et al. SafeSpace MFNet： Precise and efficient multifeature drone detection network［J］. IEEE Transactions on Vehicular Technology， 2024， 73（3）： 3106-3118.
[39]	BEWLEY A， GE Z Y， OTT L， et al. Simple online and realtime tracking［C］∥2016 IEEE International Conference on Image Processing （ICIP）. Piscataway： IEEE Press， 2016： 3464-3468.
[40]	WANG P， WANG Y C， LI D Y. DroneMOT： Drone-based multi-object tracking considering detection difficulties and simultaneous moving of drones and objects［C］∥ 2024 IEEE International Conference on Robotics and Automation （ICRA）. Piscataway： IEEE Press， 2024： 7397-7404.
[41]	WU H， NIE J， HE Z， et al.One-shot multiple object tracking in UAV video using task-specific fine-grained features［J］. Remote Sensing， 2022， 14（16）： 3853.
[42]	LV W Y， ZHANG N， ZHANG J J， et al. One-shot multiple object tracking with robust ID preservation［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2024， 34（6）： 4473-4488.
[43]	CHU Q， OUYANG W L， LI H S， et al. Online multi-object tracking using CNN-based single object tracker with spatial-temporal attention mechanism［C］∥2017 IEEE International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2017： 4846-4855.
[44]	DANG Z Y， SUN X Y， SUN B， et al. OMCTrack： Integrating occlusion perception and motion compensation for UAV multi-object tracking［J］. Drones， 2024， 8（9）： 480.
[45]	DENG C W， WU J P， HAN Y Q， et al. Learning a robust topological relationship for online multiobject tracking in UAV scenarios［J］. IEEE Transactions on Geoscience and Remote Sensing， 2024， 62： 5628615.
[46]	ZENG F G， DONG B， ZHANG Y A， et al. MOTR： End-to-end multiple-object tracking with transformer［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 659-675.
[47]	ZHU P F， ZHENG J Y， DU D W， et al. Multi-drone-based single object tracking with agent sharing network［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2021， 31（10）： 4058-4070.
[48]	XUE Y L， JIN G D， SHEN T， et al. Consistent representation mining for multi-drone single object tracking［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2024， 34（11）： 10845-10859.
[49]	WU H， SUN H， JI K F， et al. Temporal-spatial feature interaction network for multi-drone multi-object tracking［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2025， 35（2）： 1165-1179.
[50]	LIU Z H， SHANG Y Y， LI T M， et al. Robust multi-drone multi-target tracking to resolve target occlusion： A benchmark［J］. IEEE Transactions on Multimedia， 2023， 25： 1462-1476.
[51]	GUAN D Y， CAO Y P， YANG J X， et al. Fusion of multispectral data through illumination-aware deep neural networks for pedestrian detection［J］. Information Fusion， 2019， 50： 148-157.
[52]	TANG L F， YUAN J T， ZHANG H， et al. PIAFusion： A progressive infrared and visible image fusion network based on illumination aware［J］. Information Fusion， 2022， 83-84： 79-92.
[53]	LIANG T F， JIN Y， LIU W， et al. Cross-modality transformer with modality mining for visible-infrared person re-identification［J］. IEEE Transactions on Multimedia， 2023， 25： 8432-8444.
[54]	ZHU Y B， WANG Q W， LI C L， et al. Visible-thermal multiple object tracking： Large-scale video dataset and progressive fusion approach［J］. Pattern Recognition， 2025， 161： 111330.
[55]	ZHU P F， WEN L Y， DU D W， et al. Detection and tracking meet drones challenge［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（11）： 7380-7399.
[56]	DU D W， QI Y K， YU H Y， et al. The unmanned aerial vehicle benchmark： Object detection and tracking［C］∥Computer Vision-ECCV 2018. Cham： Springer， 2018： 375-391.
[57]	MANDAL M， KUMAR L K， VIPPARTHI S K. MOR UAV： A benchmark dataset and baselines for moving ob ject recognition in UAV videos［C］∥Proceedings of the 28th ACM International Conference on Multimedia. New York： ACM， 2020： 2626-2635.
[58]	YE H， SUNDERRAMAN R， JI S H. UAV3D： A large scale 3D perception benchmark for unmanned aerial ve hicles［C］∥Advances in Neural Information Processing Systems 37 （NeurIPS 2024）， 2024： 55425-55442.
[59]	YING X Y， XIAO C， AN W， et al. Visible-thermal tiny object detection： A benchmark dataset and baselines［J］. IEEE Transactions on Pattern Analysis and Machine In telligence， 2025， 47（7）： 6088-6096.
[60]	YANG M Z， HAN G X， YAN B， et al. Hybrid-SORT： Weak cues matter for online multi-object tracking［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2024， 38（7）： 6504-6512.
[61]	CAO J K， PANG J M， WENG X S， et al. Observationcentric SORT： Rethinking SORT for robust multi-object tracking［C］∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2023： 9686-9696.
[62]	LI J， YE D H， CHUNG T， et al. Multi-target detection and tracking from a single camera in Unmanned Aerial Vehicles （UAVs）［C］∥2016 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. Piscataway： IEEE Press， 2016： 4992-4997.
[63]	PAN S Y， TONG Z H， ZHAO Y Y， et al. Multi-object tracking hierarchically in visual data taken from drones［C］∥2019 IEEE/CVF International Conference on Computer Vision Workshop （ICCVW）. Piscataway： IEEE Press， 2019： 135-143.
[64]	DUAN K W， BAI S， XIE L X， et al. CenterNet： Keypoint triplets for object detection［C］∥2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2019： 6568-6577.
[65]	SHI L K， ZHANG Q R， PAN B， et al. Global-local and occlusion awareness network for object tracking in UAVs［J］. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing， 2023， 16： 8834-8844.
[66]	BARBARY M， ELAZEEM M H A. Drones tracking based on robust cubature Kalman-TBD-Multi-Bernoulli filter［J］. ISA Transactions， 2021， 114： 277-290.
[67]	LIU S， LI X， LU H C， et al. Multi-object tracking meets moving UAV［C］∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2022： 8866-8875.
[68]	CHENG S， YAO M B， XIAO X M. DC-MOT： Motion deblurring and compensation for multi-object tracking in UAV videos［C］∥2023 IEEE International Conference on Robotics and Automation （ICRA）. Piscataway： IEEE Press， 2023： 789-795.
[69]	QIU B Y， GUO Y F， XUE A K， et al. Improved Gaussian processes linear JPDA filter for multiple extended targets tracking in dense clutter［J］. Digital Signal Processing， 2024， 153： 104600.
[70]	XU S Y， SAVVARIS A， HE S M， et al. Real-time implementation of YOLO+JPDA for small scale UAV multiple object tracking［C］∥2018 International Conference on Unmanned Aircraft Systems （ICUAS）. Piscataway： IEEE Press， 2018： 1336-1341.
[71]	REDMON J， FARHADI A. YOLO9000： Better， faster， stronger［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 6517-6525.
[72]	MEMON S A， ULLAH I. Detection and tracking of the trajectories of dynamic UAVs in restricted and cluttered environment［J］. Expert Systems with Applications， 2021， 183： 115309.
[73]	WANG D J， LIAN B W， LIU Y Y， et al. A cooperative UAV swarm localization algorithm based on probabilistic data association for visual measurement［J］. IEEE Sensors Journal， 2022， 22（20）： 19635-19644.
[74]	CHAI J D， HE S M， SHIN H S， et al. Domain-knowledge-aided airborne ground moving targets tracking［J］. Aerospace Science and Technology， 2024， 144： 108807.
[75]	MILAN A， REZATOFIGHI S H， DICK A， et al. Online multi-target tracking using recurrent neural networks［C］∥Proceedings of the AAAI Conference on Artificial Intel ligence. Reston： AIAA， 2017： 4255-4232.
[76]	SADEGHIAN A， ALAHI A， SAVARESE S. Tracking the untrackable： Learning to track multiple cues with long-term dependencies［C］∥2017 IEEE International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2017： 300-311.
[77]	XIAO F Y， LEE Y J. Video object detection with an aligned spatial-temporal memory［C］∥Computer Vision-ECCV 2018. Cham： Springer， 2018： 494-510.
[78]	YAO M F， WANG J Q， PENG J L， et al. FOLT： Fast multiple object tracking from UAV-captured videos based on optical flow［C］∥Proceedings of the 31st ACM International Conference on Multimedia. New York： ACM， 2023： 3375-3383.
[79]	YU H Y， LI G R， SU L， et al. Conditional GAN based individual and global motion fusion for multiple object tracking in UAV videos［J］. Pattern Recognition Letters， 2020， 131： 219-226.
[80]	LIU Z， LIN Y T， CAO Y， et al. Swin transformer： Hierarchical vision transformer using shifted windows［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 9992-10002.
[81]	YAO T， LI Y H， PAN Y W， et al. HIRI-ViT： Scaling vision transformer with high resolution inputs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2024， 46（9）： 6431-6442.
[82]	HU M J， ZHU X T， WANG H T， et al. STDFormer： Spatial-temporal motion transformer for multiple object tracking［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2023， 33（11）： 6571-6594.
[83]	SONG I， LEE J. SFTrack： A robust scale and motion adaptive algorithm for tracking small and fast moving objects［C］∥2024 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. Piscataway： IEEE Press， 2024： 10870-10877.
[84]	KAPANIA S， SAINI D， GOYAL S， et al. Multi object tracking with UAVs using deep SORT and YOLOv3 RetinaNet detection framework［C］∥Proceedings of the 1st ACM Workshop on Autonomous and Intelligent Mobile Systems. New York： ACM， 2020： 1-6.
[85]	WOJKE N， BEWLEY A， PAULUS D. Simple online and realtime tracking with a deep association metric［C］∥2017 IEEE International Conference on Image Processing （ICIP）. Piscataway： IEEE Press， 2017： 3645-3649.
[86]	ZHANG Y F， SUN P Z， JIANG Y， et al. ByteTrack： Multi-object tracking by Associating every detection box［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 1-21.
[87]	ZHANG W， LI J M， XIA M， et al. OffsetNet： Towards efficient multiple object tracking， detection， and segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2025， 47（2）： 949-960.
[88]	MA J B， LIU D X， QIN S L， et al. An asymmetric feature enhancement network for multiple object tracking of unmanned aerial vehicle［J］. Remote Sensing， 2024， 16（1）： 70.
[89]	BERGMANN P， MEINHARDT T， LEAL-TAIXE L. Tracking without bells and whistles［C］∥2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2019： 941-951.
[90]	WU H， HE Z W， GAO M Y. GCEVT： Learning global context embedding for vehicle tracking in unmanned aerial vehicle videos［J］. IEEE Geoscience and Remote Sensing Letters， 2022， 20： 6000705.
[91]	XU L B， HUANG Y P. Rethinking joint detection and embedding for multiobject tracking in multiscenario［J］. IEEE Transactions on Industrial Informatics， 2024， 20（6）： 8079-8088.
[92]	LI W Q， MU J T， LIU G Z. Multiple object tracking with motion and appearance cues［C］∥2019 IEEE/CVF International Conference on Computer Vision Workshop （ICCVW）. Piscataway： IEEE Press， 2019： 161-169.
[93]	ZHANG Y F， WANG C Y， WANG X G， et al. FairMOT： On the fairness of detection and re-identification in multiple object tracking［J］. International Journal of Computer Vision， 2021， 129（11）： 3069-3087.
[94]	SHEN Z Q， CAI K Q， ZHAO P， et al. An interactively motion-assisted network for multiple object tracking in complex traffic scenes［J］. IEEE Transactions on Intelligent Transportation Systems， 2024， 25（2）： 1992-2004.
[95]	KIM C， LI F X， REHG J M. Multi-object tracking with neural gating using bilinear LSTM［C］∥Computer Vision-ECCV 2018. Cham： Springer， 2018： 208-224.
[96]	YU Q J， MA Y C， HE J F， et al. A unified transformerbased tracker for anti-UAV tracking［C］∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2023： 3036-3046.
[97]	NIE J H， WU H， HE Z W， et al. Spreading fine-grained prior knowledge for accurate tracking［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（9）： 6186-6199.
[98]	ZHENG L Y， TANG M， CHEN Y Y， et al. Improving multiple object tracking with single object tracking［C］∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2021： 2453-2462.
[99]	FENG W T， LI B P， OUYANG W L. Multi-object tracking with multiple cues and switcher-aware classification［C］∥2022 International Conference on Digital Image Computing： Techniques and Applications （DICTA）. Piscataway： IEEE Press， 2022： 1-10.
[100]	LI B， WU W， WANG Q， et al. SiamRPN++： Evolution of Siamese visual tracking with very deep networks［C］∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2019： 4277-4286.
[101]	ZHU J， YANG H， LIU N， et al. Online multi-object tracking with dual matching attention networks［C］∥Computer Vision-ECCV 2018. Cham： Springer， 2018： 379-396.
[102]	DANELLJAN M， BHAT G， KHAN F S， et al. ECO： Efficient convolution operators for tracking［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 6931-6939.
[103]	GHOSH S， PATRIKAR J， MOON B， et al. AirTrack： Onboard deep learning framework for long-range air craft detection and tracking［C］∥2023 IEEE International Conference on Robotics and Automation （ICRA）. Pisca taway： IEEE Press， 2023： 1277-1283.
[104]	SHUAI B， BERNESHAWI A， LI X Y， et al. SiamMOT： Siamese multi-object tracking［C］∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2021： 12367-12377.
[105]	YANG L， WANG H Q， SUN H J， et al. MOFTrack： Multi object formation tracking in Remote sensing videos［C］∥Pattern Recognition and Computer Vision. Singapore： Springer， 2025： 551-565.
[106]	REN S Q， HE K M， GIRSHICK R， et al. Faster R-CNN： Towards real-time object detection with region proposal networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017， 39（6）： 1137-1149.
[107]	WANG Y X， KITANI K， WENG X S. Joint object detection and multi-object tracking with graph neural networks［C］∥2021 IEEE International Conference on Robotics and Automation （ICRA）. New York： ACM， 2021： 13708-13715.
[108]	HE X J， JIN J， CHEN D， et al. RoMATer： An end-to-end robust multiaircraft tracker with transformer［C］∥ 2024 International Joint Conference on Neural Networks （IJCNN）. Piscataway： IEEE Press， 2024： 1-8.
[109]	XU Y H， BAN Y T， DELORME G， et al. TransCenter： Transformers with dense representations for multiple object tracking［J］. IEEE Transactions on Pattern Analy sis and Machine Intelligence， 2023， 45（6）： 7820-7835.
[110]	MEINHARDT T， KIRILLOV A， LEAL-TAIXÉ L， et al. TrackFormer： Multi-object tracking with transform ers［C］∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2022： 8834-8844.
[111]	LEI D， XU M， WANG S A. A deep multimodal network for multi-task trajectory prediction［J］. Information Fusion， 2025， 113： 102597.
[112]	WANG Z C， CHENG P R， CHEN M X， et al. Drones help drones： A collaborative framework for multi-drone object trajectory prediction and beyond［C］∥Advances in Neural Information Processing Systems 37 （NeurIPS 2024）， 2024： 64604-64628.
[113]	CHEN G L， ZHU P F， CAO B， et al. Cross-drone transformer network for robust single object tracking［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2023， 33（9）： 4552-4563.
[114]	FU Z H， FU Z H， LIU Q J， et al. SparseTT： Visual tracking with sparse transformers［DB/OL］. arXiv preprint：2205.03776， 2022.
[115]	伍瀚，孙浩，计科峰，等. 时序信息引导跨视角特征融合的多无人机多目标跟踪方法［J］. 电子学报， 2025， 53（3）： 728-743.
	WU H， SUN H， JI K F， et al. Temporal-guided crossview feature fusion network for multi-drone multi-object tracking［J］. Acta Electronica Sinica， 2025， 53（3）： 728 743 （in Chinese）.
[116]	JAVED S， HASSAN A， AHMAD R， et al. State-of-the-art and future research challenges in UAV swarms［J］. IEEE Internet of Things Journal， 2024， 11（11）： 19023-19045.
[117]	SUN J M， SHEN Z H， WANG Y A， et al. LoFTR： Detector-free local feature matching with transformers［C］∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2021： 8918-8927.
[118]	LINDENBERGER P， SARLIN P E， POLLEFEYS M. LightGlue： Local feature matching at light speed［C］∥ 2023 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2023： 17581-17592.
[119]	AMOSA T I， SEBASTIAN P， IZHAR L I， et al. Multicamera multi-object tracking： A review of current trends and future advances［J］. Neurocomputing， 2023， 552： 126558.
[120]	QIAN Y J， YU L J， LIU W H， et al. ELECTRICITY： An efficient multi-camera vehicle tracking system for intelligent city［C］∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2020： 2511-2519.
[121]	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］∥2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2016： 770-778.
[122]	ZUO G B， ZHOU K， WANG Q， UAV-to-UAV small target detection method based on deep learning in Complex scenes［J］. IEEE Sensors Journal， 2025， 25（2）： 3806-3820.
[123]	GUO Y D， LIU Z Y， LUO H， et al. Multi-person multicamera tracking for live stream videos based on improved motion model and matching cascade［J］. Neurocomputing， 2022， 492： 561-571.
[124]	周翰祺，方东旭，张宁波，等. 基于深度学习的多无人机多目标跟踪［J］. 计算机工程， 2025， 51（4）： 57-65.
	ZHOU H Q， FANG D X， ZHANG N B， et al. Multi-UAV multi-object tracking based on deep learning［J］. Computer Engineering， 2025， 51（4）： 57-65 （in Chinese）.
[125]	BELLAVIA F. SIFT matching by context exposed［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（2）： 2445-2457.
[126]	POURFARD M， HOSSEINIAN T， SAEIDI R， et al. KAZE-SAR： SAR image registration using KAZE detector and modified SURF descriptor for tackling speckle noise［J］. IEEE Transactions on Geoscience and Remote Sensing， 2021， 60： 5207612.
[127]	QIN Z， ZHOU S P， WANG L， et al. MotionTrack： Learning robust short-term and long-term motions for multi-object tracking［C］∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2023： 17939-17948.
[128]	ZHOU K L， CHEN L S， CAO X. Improving multispectral pedestrian detection by addressing modality imbalance problems［C］∥Computer Vision-ECCV 2020. Cham： Springer， 2020： 787-803.
[129]	XU H， MA J Y， JIANG J J， et al. U2Fusion： A unified unsupervised image fusion network［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（1）： 502-518.
[130]	张帆，丛玮，田润操，等. 基于双层变权的异构数据融合及可靠性分析［J］. 航空学报， 2024， 45（22）： 230297.
	ZHANG F， CONG W， TIAN R C， et al. Heterogeneous data fusion and reliability analysis based on two-layer variable weights［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（22）： 230297 （in Chinese）.
[131]	XU H， MA J Y， YUAN J T， et al. RFNet： Unsupervised network for mutually reinforcing multi-modal image registration and fusion［C］∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2022： 19647-19656.
[132]	HOU R C， ZHOU D M， NIE R C， et al. VIF-net： An unsupervised framework for infrared and visible image fusion［J］. IEEE Transactions on Computational Imaging， 2020， 6： 640-651.
[133]	SUN Y M， CAO B， ZHU P F， et al. Drone-based RGBinfrared cross-modality vehicle detection via uncertainty-aware learning［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（10）： 6700-6713.
[134]	DU X X， ZARE A. Multiresolution multimodal sensor fusion for remote sensing data with label uncertainty［J］. IEEE Transactions on Geoscience and Remote Sensing， 2020， 58（4）： 2755-2769.
[135]	YE M， SHEN J B， LIN G J， et al. Deep learning for person re-identification： A survey and outlook［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（6）： 2872-2893.
[136]	LI H C， LI C L， ZHU X P， et al. Multi-spectral vehicle re-identification： A challenge［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020， 34（7）： 11345-11353.
[137]	CHEN S G， XU L Z， LI X Y， et al. Frequency-space enhanced and temporal adaptative RGBT object tracking［J］. Neurocomputing， 2025， 640： 130240.
[138]	ZHANG S Z， YANG Y F， WANG P， et al. Attend to the difference： Cross-modality person re-identification via contrastive correlation［J］. IEEE Transactions on Image Processing， 2021， 30： 8861-8872.
[139]	YE M， WANG Z， LAN X Y， et al. Visible thermal person re-identification via dual-constrained top-ranking［C］∥Proceedings of the 27th International Joint Conference on Artificial Intelligence. New York： ACM， 2018： 1092-1099.
[140]	ZHANG Y Y， ZHAO S Y， KANG Y H， et al. Modality synergy complement learning with cascaded aggregation for visible-infrared person re-identification［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 462-479.
[141]	ZHU P F， PENG T， DU D W， et al. Graph regularized flow attention network for video animal counting from drones［J］. IEEE Transactions on Image Processing， 2021， 30： 5339-5351.
[142]	VARGA L A， KIEFER B， MESSMER M， et al. SeaDronesSee： A maritime benchmark for detecting humans in open water［C］∥2022 IEEE/CVF Winter Conference on Applications of Computer Vision （WACV）. Piscataway： IEEE Press， 2022： 3686-3696.
[143]	DOSOVITSKIY A， ROS G， CODEVILLA F， et al. CARLA： An open urban driving simulator［C］∥Proceedings of the 1st Annual Conference on Robot Learning. New York： PMLR Press， 2017： 1-16.
[144]	XU Q Y， WANG L G， SHENG W D， et al. Heterogeneous graph transformer for multiple tiny object tracking in RGB-T videos［J］. IEEE Transactions on Multimedia， 2024， 26： 9383-9397.
[145]	DENDORFER P， OŠEP A， MILAN A， et al. MOTChallenge： A benchmark for single-camera multiple target tracking［J］. International Journal of Computer Vision， 2021， 129（4）： 845-881.
[146]	LUITEN J， OŠEP A， DENDORFER P， et al. HOTA： A higher order metric for evaluating multi-object tracking［J］. International Journal of Computer Vision， 2021， 129（2）： 548-578.
[147]	ZHU B， WANG J， JIANG Z， et al. Autoassign： Differentiable label assignment for dense object detection［DB/OL］. arXiv preprint： 2007.03496， 2020.
[148]	GE Z， LIU S T， WANG F， et al. YOLOX： Exceeding YOLO series in 2021［DB/OL］. arXiv preprint： 2107.08430， 2021.
[149]	FENG C J， ZHONG Y J， GAO Y， et al. TOOD： Taskaligned one-stage object detection［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 3490-3499.
[150]	WANG J Q， CHEN K， XU R， et al. CARAFE： Content aware Reassembly of Features［C］∥2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2019： 3007-3016.
[151]	PANG J M， QIU L L， LI X， et al. Quasi-dense similarity learning for multiple object tracking［C］∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2021： 164-173.
[152]	VU T， JANG H， PHAM T X， et al. Cascade RPN： Delving into high-quality region proposal network with adaptive convolution［DB/OL］. arXiv prepint： 1909.06720， 2019.
[153]	HE L X， LIAO X Y， LIU W， et al. FastReID： A pytorch toolbox for general instance re-identification［C］∥Proceedings of the 31st ACM International Conference on Multimedia. New York： ACM， 2023： 9664-9667.
[154]	CHEN Y T， SHI J H， YE Z L， et al. Multimodal object detection via Probabilistic ensembling［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 139-158.
[155]	ZHOU X Y， KOLTUN V， KRÄHENBÜHL P. Tracking objects as points［C］∥Computer Vision-ECCV 2020. Cham： Springer International Publishing， 2020： 474-490.
[156]	WU J L， CAO J L， SONG L C， et al. Track to detect and segment： An online multi-object tracker［C］∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2021： 12347-12356.
[157]	SUN Y M， CAO B， ZHU P F， et al. Drone-based RGBinfrared cross-modality vehicle detection via uncertainty-aware learning［J］. IEEE Transactions on Circuits and Systems for Video Technology， 2022， 32（10）： 6700-6713.
[158]	刘延芳，佘佳宇，袁秋帆，等. 无人机遥感图像实时小目标检测方法［J］. 航空学报， 2024， 45（14）： 630119.
	LIU Y F， SHE J Y， YUAN Q F， et al. Real-time small target detection networks for UAV remote sensing［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（14）： 630119 （in Chinese）.
[159]	於志文，孙卓，程岳，等. 智能无人机集群协同感知计算研究综述［J］. 航空学报， 2024， 45（20）： 630912.
	YU Z W， SUN Z， CHENG Y， et al. A review of intelligent UAV swarm collaborative perception and computation［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（20）： 630912 （in Chinese）.
[160]	LI S Y， CHEN S L， LI X X， et al. Accurate and automatic spatiotemporal calibration for multi-modal sensor system based on continuous-time optimization［J］. Information Fusion， 2025， 120： 103071.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

综述文献	摄像头类型	研究内容	特征关联类型	引用文献年份	文献数量/篇
Jiménez等^［25］	交通监控摄像头	交通环境中的多目标跟踪	时空特征关联	2021年及之前	71
Tang等^［26］	无人机摄像头	基于深度学习的目标检测与跟踪	时空特征关联	2023年及之前	116
苑玉彬等^［27］	无人机摄像头	基于深度学习的目标检测与跟踪	时空特征关联	2023年及之前	126
Fu等^［28］	无人机摄像头	基于孪生网络的目标跟踪	时空特征关联	2023年及之前	157
Sun等^［29］	无人机摄像头	无人机动目标跟踪	时空特征关联	2022年及之前	88
本文	无人机摄像头	无人机视频多目标特征关联	时空特征关联多视角特征关联多光谱特征关联	2025年及之前	161

方法类别	优势	局限性	适用场景
基于目标轨迹预测的方法	运算效率高、对目标短时遮挡鲁棒	难以应对目标密集场景、对目标形变敏感	目标分布稀疏、目标形态固定
基于目标外观信息的方法	对目标交互鲁棒、找回丢失目标的能力强	计算开销大、对遮挡和环境变化敏感	低帧率视频、目标频繁进出视野
基于单目标跟踪辅助关联的方法	精细化关联、对小目标和杂乱背景的鲁棒性强	计算开销大、难以处理目标数量变化	目标数量少、目标模糊、小尺寸目标
基于联合检测与关联的方法	端到端优化、可高效利用上下文信息	模型复杂度高、对数据量需求大	大规模数据上高准确性关联

方法类别	网络	改进措施	优势	局限性
多视角特征融合方法	ASNet^［47］	计算不同无人机间目标外观转换关系和背景特征转换关系实现特征融合	有效结合目标细节和场景上下文增强特征	目标外观差异过大时融合特征不可靠
	DHD^［112］	将多视角特征投影至BEV，通过3D 神经网络提升模型对多视角信息的感知能力	能直接反映场景中物体的绝对位置、朝向、大小和距离等空间关系	特征投影和3D特征提取计算开销巨大
	CRM^［48］	建模区域间的粗粒度语义相关性，然后使用多头稀疏自注意力对高置信度区域进行全局感知	区域划分和稀疏注意力机制对背景干扰表现出强鲁棒性	稀疏注意力导致部分区域被忽略，影响目标表示完整性
	TCFNet^［115］	使用目标轨迹位置为先验特征点预测不同无人机所捕获视频帧间的匹配特征点	减少了计算开销且具有更强的融合可靠性	错误的目标轨迹会降低融合准确性
多视角特征匹配方法	MTMC^［120］	设计了目标重识别网络用于目标特征提取，基于欧式距离估计目标间相似性	具备一定的外观差异适应能力	无法表达复杂非线性或上下文关系
	TransMDOT^［113］	基于Transformer编码器预测多视角目标特征间的相似性	自主学习目标间多视角的外观变化规律	忽略背景信息和目标间的交互关系
	TSMMT^［49］	结合目标和背景信息捕获目标特征，通过协同学习挖掘多视角目标一致性特征	多方位建模目标表观，区分相似目标的能力强	无关的背景信息可能被引入特征表示
	MIA-Net^［50］	使用SIFT计算多视角图像间转换矩阵，计算映射坐标与检测目标间的欧式距离	能灵活处理多视角间的几何变换	在低纹理或动态背景中产生错误匹配

数据类型	数据集	年份	视频帧数	目标数量	目标类别	主要挑战
无人机单视频数据集	VisDrone	2018	>40 000	>183万	行人、自行车、汽车、公交车等10类	① 目标分布密集；② 天气和光照条件多变；③ 目标尺寸变化大
	UAVDT	2018	>80 000	>84万	汽车、卡车和公交车	① 目标视角变化大；② 目标快速运动和复杂背景干扰
	MOR-UAV	2020	10 948	89 783	汽车和重型车辆	① 区分运动与静止目标难； ② 无人机飞行高度变化频繁
	AnimalDrone	2021	53 644	>400万	绵羊、马匹、狼和牦牛	① 动物目标外观相似性高； ② 目标动作多变；③ 野外环境背景复杂
	SeaDronesSee	2022	54 000	>40万	水中漂浮者、救生衣和船只等6类	① 海洋环境目标与背景对比度低；② 水面反光和波浪影响目标特征
无人机多视角视频数据集	MDMT	2023		>220万	行人、自行车和汽车	① 背景遮挡频繁；② 目标多视角特征差异大
	Air-Co-Pred	2024	>32 000		行人、救护车、警车、货车等7类	① 背景遮挡频繁；② 虚拟场景中目标运动更加复杂
	UAV3D	2025		>300万	奥迪e-tron、特斯拉Model3 等17类	① 背景遮挡频繁；② 目标类间差异小
无人机多光谱视频数据集	VT-Tiny-MOT	2024	>93 000	>120万	船只、行人和飞机等7类	① 小目标占比多；② 目标类别和场景类型多
	RGB-Tiny	2025	>93 000	>120万	汽车、无人机和公交车等7类	① 小目标占比多；② 目标类别和场景类型多
	VT-MOT	2025	>40万	>399万	车辆和行人	① 无人机飞行高度和视角变化频繁；② 目标类别呈现长尾分布

方法	模型	MOTA↑	MOTP↑	IDF↑	FP↓	FN↓	IDS↓
基于目标轨迹预测的方法	SORT^［39］	18.1	65.1	32.2	104 453	78 467	3 342
	DeepSORT^［85］	32.4	75.9	45.1	12 829	65 797	1 153
	UAVMOT^［67］	36.1	74.2	51.0	27 983	115 925	2 775
	DC-MOT^［68］	33.5	76.1	45.4	12 594	64 856	1 139
	GLOA^［65］	39.1	76.1	46.2	18 715	158 043	4 426
	FOLT^［78］	42.1	77.6	56.9	24 105	107 630	800
	DroneMOT^［40］	43.7	71.4	58.6	41 998	86 177	1 112
	STDFormer^［82］	45.9	77.9	57.1	21 288	101 506	1 440
基于目标外观信息的方法	SFTrack^［83］	47.2		62.1	27 159	94 910	557
	SCTrack^［84］	35.8	75.6	45.1		85 623	798
	ByteTrack^［86］	25.1	72.4	40.8	34 044	194 984	1 590
	AsyUAV^［88］	38.3		51.7	46 392	93 681	3 954
	GCEVT^［90］	34.5	73.8	50.6			841
	MMTrack^［91］	36.7		54.7	23 849	120 839	545
	FPUAV^［41］	34.3	74.2	45.0			2 138
	FlowTracker^［92］	32.1	78.7	50.1		39 423	112
	FairMOT^［93］	30.8	74.3	41.9			3 007
	PID-MOT^［42］	33.0	74.1	50.2	53 691	96 541	3 529
基于单目标跟踪辅助关联的方法	OMCTrack^［44］	34.5		50.6	47 892	151 623	1 980
基于联合检测与关联的方法	SiamMOT^［104］	31.9	73.5	48.3	24 123	142 303	862
	UGT^［45］	41.7		57.7	15 174	101 074	618
	MOTR^［46］	22.8	72.8	41.4	28 407	147 937	959
	TrackFormer^［110］	25.0	73.9	30.5	25 856	141 526	4 840

无人机视频多目标特征关联技术研究进展

Multi-object feature association in UAV videos: Recent progress and perspectives

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 26

参考文献 160

相关文章 15

编辑推荐

Metrics

本文评价

方法类别	优势	局限
LSTM	适合短期运动、计算效率高、参数规模小	长时间依赖建模不足、目标动态交互建模困难
GAN	生成多样化轨迹可覆盖多种可能性、对噪声鲁棒性强	训练稳定性弱、计算资源消耗大
Transformer	全局时空关系建模能力强、支持多目标交互推理	长序列计算复杂度高、实时性较弱

方法类别	优势	局限
孪生网络	特征匹配效率高、轻量化设计、适合实时处理	对目标遮挡和形变敏感、局部特征易丢失
Transformer	全局上下文解决遮挡、端到端优化可减少误差累计	计算开销大、依赖大规模数据、小样本场景易过拟合

类型	模型	HOTA↑	MOTA↑	MOTP↑	MT↑	ML↓	FP↓	FN↓	IDF↑	IDS↓	FPS↑
可见光	DeepSORT^［85］	23.4	12.4	68.8	15	623	7 043	148 453	20.7	2 062	28.8
	Tracktor^［89］	23.3	3.0	64.2	152	605	28 494	144 647	22.5	1 283	10.6
	ByteTrack^［86］	26.3	13.3	68.7	152	625	6 670	148 259	26.3	844	38.4
	OCSORT^［61］	25.8	10.6	68.0	156	612	11 940	146 690	25.7	2 034	24.3
	CenterTrack^［155］	11.8	10.3	71.6	16	680	1 859	154 977	8.8	3 933	41.0
	FairMOT^［93］	22.7	12.1	67.6	76	688	2 721	153 421	22.4	1 960	21.7
	TraDes^［156］	20.0	8.8	66.0	82	691	9 938	153 532	20.4	428	30.3
	GSDT^［107］	22.7	8.9	71.3	76	688	2 722	153 415	22.4	1 950	1.3
	TransCenter^［109］	5.0	-4.0	46.6	11	856	10 522	176 028	2.1	349	22.3
	ProbEn^［154］+SORT^［39］	26.3	24.7	67.4	211	450	12 108	111 365	26.1	11 930
	UA-CMDet^［157］+ SORT^［39］	24.1	8.6	65.1	190	464	36 612	121 063	23.2	6 696
	HGT-Track^［144］	29.1	32.0	55.6	226	444	11 594	109 500	43.2	1 223	13.2
红外	DeepSORT^［85］	21.9	8.4	73.7	142	689	9 589	160 768	17.2	2 158	28.7
	Tracktor^［89］	24.5	5.8	69.5	153	661	22 465	154 425	2.7	881	11.6
	ByteTrack^［86］	25.5	9.2	73.9	150	692	9 835	160 458	22.6	915	34.1
	OCSORT^［61］	24.4	0.9	71.6	153	655	27 143	156 803	21.4	2 921	17.5
	CenterTrack^［155］	15.0	18.8	72.2	53	591	3 089	143 279	13.3	6 194	39.9
	FairMOT^［93］	15.3	8.0	62.2	45	785	1 676	170 889	14.9	928	19.8
	TraDes^［156］	26.0	15.7	67.0	153	609	15 271	143 153	30.5	532	28.3
	GSDT^［107］	15.3	8.9	62.3	45	758	1 678	170 891	14.9	927	0.9
	TransCenter^［109］	4.8	0.2	62.3	2	909	1 628	186 443	1.6	129	22.5
	ProbEn^［154］+SORT^［39］	20.7	18.3	70.4	250	428	9 649	110 327	16.3	34 042
	UA-CMDet^［157］+SORT^［39］	19.8	10.5	66.6	247	466	24 452	120 005	15.6	24 390
	HGT-Track^［144］	23.1	21.3	51.5	155	588	12 516	135 197	35.3	844	13.2

[1]	鹿瑶, 李子豪, 刘准钆, 杨衍波. 基于Transformer的异类目标智能关联跟踪[J]. 航空学报, 2025, 46(17): 331643-331643.
[2]	赵江, 皮明豪, 田栢苓, 池沛, 王英勋. 面向多目标跟踪的集群无人机自组织共识决策方法[J]. 航空学报, 2025, 46(16): 331635-331635.
[3]	王传云, 苏阳, 王琳霖, 王田, 王静静, 高骞. 面向反制无人机集群的多目标连续鲁棒跟踪算法[J]. 航空学报, 2024, 45(7): 329017-329017.
[4]	褚昭晨, 宋韬, 金忍, 林德福. 基于视觉图像的空对空多无人机目标跟踪[J]. 航空学报, 2024, 45(14): 629379-629379.
[5]	苑玉彬, 吴一全, 赵朗月, 陈金林, 赵其昌. 基于深度学习的无人机航拍视频多目标检测与跟踪研究进展[J]. 航空学报, 2023, 44(18): 28334-028334.
[6]	靳标, 邝晓飞, 彭宇, 张贞凯. 基于合作博弈的组网雷达分布式功率分配方法[J]. 航空学报, 2022, 43(1): 324776-324776.
[7]	田晨, 裴扬, 侯鹏, 赵倩. 基于决策不确定性的多目标跟踪传感器管理[J]. 航空学报, 2020, 41(10): 323781-323781.
[8]	闫涛, 韩崇昭, 张光华. 空中目标传感器管理方法综述[J]. 航空学报, 2018, 39(10): 22209-022209.
[9]	彭华甫, 黄高明, 田威, 邱昊. 幅度及多普勒信息辅助的多目标跟踪算法[J]. 航空学报, 2018, 39(10): 322247-322247.
[10]	徐从安, 刘瑜, 熊伟, 宋瑞华, 李天梅. 新生目标强度未知的双门限粒子PHD滤波器[J]. 航空学报, 2015, 36(12): 3957-3969.
[11]	虞翔, 张建秋. 高斯非平稳机动目标波达角模型及跟踪[J]. 航空学报, 2015, 36(10): 3430-3438.
[12]	吴鑫辉, 黄高明, 高俊. 异步多传感器多目标PHD航迹合成算法[J]. 航空学报, 2013, 34(12): 2785-2793.
[13]	庄泽森;张建秋;尹建君. 多目标跟踪的核粒子概率假设密度滤波算法[J]. 航空学报, 2009, 30(7): 1264-1270.
[14]	庄泽森;张建秋;尹建君. Rao-Blackwellized粒子概率假设密度滤波算法[J]. 航空学报, 2009, 30(4): 698-705.
[15]	朱自谦;. 一种通用航迹起始模型[J]. 航空学报, 2009, 30(3): 497-504.

模型	无人机1		无人机2		整体
模型	MOTA↑	IDF↑	MOTA↑	IDF↑	MOTA↑	IDF↑
Faster R-CNN^［106］+ByteTrack^［86］	53.88	67.71	47.98	64.14	50.92	65.93
TOOD^［149］+ByteTrack^［86］	50.95	66.42	48.02	63.92	49.49	65.18
AutoAssign^［147］+ByteTrack^［86］	49.52	67.38	44.52	63.75	47.01	65.56
Carafe^［150］+ByteTrack^［86］	54.13	68.22	48.42	64.93	51.38	66.58
YOLOX^［148］+ByteTrack^［86］	56.79	72.38	49.50	65.94	53.15	69.16
AutoAssign^［147］+MIA-Net^［50］	51.90	69.67	47.46	66.81	49.68	68.24
Carafe^［150］+MIA-Net^［50］	54.92	68.82	48.23	65.12	51.58	66.97
Faster R-CNN^［106］+DeepSORT^［85］	50.20	60.48	41.52	52.44	48.86	56.46
Faster R-CNN^［106］+QDTrack^［151］	52.12	66.23	43.02	57.68	47.57	61.96
Carafe^［150］+QDTrack^［151］	53.20	66.28	43.06	57.46	48.13	61.87
Cascade RPN^［153］+QDTrack^［151］	51.05	65.00	45.75	58.66	48.40	61.82
AutoAssign^［147］+SBS-50^［153］	44.49	55.89	40.65	54.69	42.57	55.29
Carafe^［150］+SBS-50^［153］	50.46	57.04	45.33	55.86	47.89	56.44
Faster R-CNN^［106］+TCFNet^［115］	56.12	70.27	51.62	67.01	53.81	68.64
Faster R-CNN^［106］+TSMMT^［49］	56.54	71.75	52.12	68.58	54.34	70.30
YOLOX^［148］+MUMTTrack^［124］	58.36	73.21	51.72	69.01	55.04	71.11