航空学报 > 2026, Vol. 47 Issue (8): 332791-332791   doi: 10.7527/S1000-6893.2025.32791

基于频域特征和Transformer的无人机目标跟踪算法

刘芳1, 崔静虎1(), 卢晨阳1, 王鑫2, 浦昭辉3   

  1. 1.北京工业大学 信息科学技术学院,北京 100124
    2.国网北京丰台供电公司,北京 100161
    3.国网北京市电力公司信通分公司,北京 100761
  • 收稿日期:2025-09-17 修回日期:2025-10-23 接受日期:2025-11-20 出版日期:2025-12-01 发布日期:2025-11-28
  • 通讯作者: 崔静虎 E-mail:S202487019@emails.bjut.edu.cn

A UAV target tracking algorithm based on frequency-domain feature and transformer

Fang LIU1, Jinghu CUI1(), Chenyang LU1, Xin WANG2, Zhaohui PU3   

  1. 1.School of Information Science and Technology,Beijing University of Technology,Beijing 100124,China
    2.Fengtai Power Supply Bureau of Beijing Power Supply Bureau,Beijing 100161,China
    3.Information and Communication Branch of State Grid Beijing Electric Power Company,Beijing 100761,China
  • Received:2025-09-17 Revised:2025-10-23 Accepted:2025-11-20 Online:2025-12-01 Published:2025-11-28
  • Contact: Jinghu CUI E-mail:S202487019@emails.bjut.edu.cn

摘要:

随着无人机技术的不断发展,目标跟踪已成为无人机应用的关键技术之一。针对无人机目标跟踪中,目标易发生遮挡、形变、尺度变化以及多视角变化等问题,提出一种基于频域特征和Transformer的无人机目标跟踪算法。首先,采用蒸馏后的Transformer深度网络提取图像空间全局特征,随后利用自适应频域感知网络提取频域细节特征,同时在输入端增添学习图像作为补充,以捕获目标模块与搜索区域之间的相关性,用于更新初始目标模板,增强对目标的表征能力。其次,提出一种基于互信息最大化的多视角不变特征学习策略,通过最大化目标模板与搜索模板之间的互信息设计新的损失函数,提升跟踪网络处理目标变化的能力。最后,根据学习图像特征响应确定目标位置。仿真实验结果表明,该算法能够有效提升无人机目标跟踪的精度,具有较好的鲁棒性。

关键词: 机器视觉, 无人机, 目标跟踪, 频域特征, 深度网络

Abstract:

With the rapid development of Unmanned Aerial Vehicle (UAV) technology, target tracking has become one of the key techniques in UAV applications. To address challenges such as occlusion, deformation, scale variation, and multi-view changes in UAV target tracking, this paper proposes a UAV target tracking algorithm based on frequency-domain feature and Transformer architecture. First, a distilled Transformer network is employed to extract global spatial features from images, and an adaptive frequency-domain deep network is employed to capture detailed frequency-domain features. meanwhile, a learning image is introduced at the input stage to capture the correlation between the target template and the search region, thereby updating the initial target template and enhancing target representation. Second, a multi-view invariant feature learning strategy based on mutual information maximization is proposed. By maximizing the mutual information between the target template and the search template, a novel loss function is designed to improve the network’s robustness against target appearance variations. Finally, the target position is determined according to the feature responses of the learning image. Simulation results demonstrate that the proposed algorithm effectively improves UAV target tracking accuracy and exhibits strong robustness under complex scenarios.

Key words: machine vision, unmanned aerial vehicle, target tracking, frequency-domain feature, deep network

中图分类号: