• YANG Xi-Yang ,
  • LIN Jia-Quan
Expand

Received date: 2026-01-22

  Revised date: 2026-05-18

  Online published: 2026-05-19

Abstract

To address the demand for precise perception in complex environments for Unmanned Aerial Vehicles (UAVs), dual-modal fusion detection technology integrating visible light and infrared has garnered significant attention due to its distinct complementary advantages. However, existing methods often struggle to achieve a balance between high accuracy and high efficiency when confronting challenges such as heterogeneous modal features, complex background interference, and weak small-object characteristics in aerial photography scenarios. To tackle these issues, this paper proposes a object detection method based on cross-modal common-mode interaction and differential perception. First, to address cross-modal feature alignment challenges, a Bidirectional-Cross-Modal Common Mode Fusion (BCMF) module is designed. This module employs a bidirectional attention mechanism to enable deep interaction between visible light and infrared modalities and extract common features. Second, to suppress complex background noise and enhance target saliency, a Context-Gated Differential Block (CGDB) module is constructed. This module employs large receptive field context information for adaptive gated feature selection. Furthermore, to enhance multi-scale feature discriminative power, an innovative dual FPN architecture is adopted. This independently maintains and fuses dual-modal feature streams, preventing feature confusion. Experiments on the DroneVehicle and VEDAI datasets demonstrate that the proposed method achieves high average accuracy while maintaining model lightweightness. Its overall performance shows significant improvement over existing mainstream fusion methods.

Cite this article

YANG Xi-Yang , LIN Jia-Quan . [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 0 : 1 -0 . DOI: 10.7527/S1000-6893.2026.33406

Outlines

/