首页 >

时空特征联合增强的空对空红外无人机检测

尹星宇,金忍,褚昭晨,贾明东,刘新福   

  1. 北京理工大学
  • 收稿日期:2026-01-07 修回日期:2026-04-07 出版日期:2026-04-14 发布日期:2026-04-14
  • 通讯作者: 金忍
  • 基金资助:
    国家自然科学基金;国家重点研发计划 “社会治理与智慧社会科技支撑(平安中国)”重点专项“基于高分辨探测和智能低损反制的无人机防控关键技术与装备研发”项目

Spatio-Temporal Feature Joint Enhancement for Air-to-Air Infrared UAV Small Target Detection

  • Received:2026-01-07 Revised:2026-04-07 Online:2026-04-14 Published:2026-04-14
  • Contact: Ren JIN

摘要: 基于视觉的空对空红外无人机检测是空中监视和蜂群对抗等领域的关键技术,本文针对红外视频序列中无人机目标边缘不清晰、在单一时刻下目标与背景对比度低的问题,提出一种时空特征联合增强的空对空红外无人机目标检测算法,在改进的通用U型网络结构中嵌入两类模块:其一,构建基于跨层级语义引导的空间注意力模块,在编码器阶段通过下采样生成的深层特征图语义信息对其上一级特征图对应区域的纹理特征进行加权锐化,得到空间增强特征图,实现对目标区域特征聚焦;其二,设计时空引导的多头注意力模块,设置时间窗口,在解码器阶段将时间窗口下经过解码的连续帧高维特征图与当前帧经过基于跨层级语义引导的空间注意力模块输出的空间增强特征图跨注意力运算,用多帧空间一致性特征增强当前目标表征,从而实现空间细节与时间上下文的联合特征增强。将算法在Drone-detection-dataset(无人机类子集)与Anti-UAV数据集上进行实验验证,结果表明,与当前主流红外小目标检测方法相比,相比性能较优的MFE-Net在mIoU指标上提升2.01%,虚警率降低0.72%;在Anti-UAV数据集上相比MFE-Net在mIoU指标上提升1.10%,虚警率降低1.21%,表明算法在红外图像空对空无人机检测精确性与稳定性方面均取得最优性能。

关键词: 无人机, 深度学习, 红外目标检测, 视频弱小目标检测, 时空特征, 跨层级语义引导

Abstract: Vision-based air-to-air infrared unmanned aerial vehicle (UAV) detection is a key technology for aerial surveillance and swarm confrontation applications. Aiming at the challenges of blurred target boundaries and low target-to-background contrast at a single time instant in infrared video sequences, this paper proposes a spatio-temporal joint feature enhancement framework for air-to-air infrared UAV small target detection method. The proposed approach integrates two dedicated modules into an improved generic U-shaped network architecture. First, a Multi-Level Semantic-guided Spatial Attention Module is designed. High-level semantic information is extracted from down-sampled deep feature maps in the encoder and is used to adaptively enhance the corresponding textured regions of low-level feature maps, producing spatially enhanced features that focus on target regions. Second, a Spatial-Temporally Guided Multi-Head Attention Module is designed. By introducing a temporal window, the spatial-temporally guided multi-head attention module performs cross-attention. It takes the spatially enhanced features of the current frame from the multi-level semantic-guided spatial attention module as the query. It takes high-dimensional decoded features from multiple consecutive frames as the key and value. This design exploits spatial consistency across frames and reinforces the target representation. In this way, the targets’ spatial details and temporal contextual information are jointly enhanced. Extensive experiments are conducted on the Drone subset of the Drone-detection-dataset and the Anti-UAV dataset. The results demonstrate that, compared with state-of-the-art infrared small target detection methods, the proposed method achieves superior performance. Specifically, on the Drone-detection-dataset, our method improves the mIoU by 2.01% and reduces the false alarm rate by 0.72% compared with the strong baseline MFE-Net. On the Anti-UAV dataset, the proposed method achieves a 1.10% improvement in mIoU and a 1.21% reduction in false alarm rate over MFE-Net. These results indicate that the proposed method achieves improved detection accuracy and stability for air-to-air infrared UAV detection.

Key words: UAV, Deep Learning, Infrared Target Detection, Video-based Small Target Detection, Spatio-Temporal Features, Cross-Level Semantic Guidance

中图分类号: