导航

Acta Aeronautica et Astronautica Sinica ›› 2026, Vol. 47 ›› Issue (10): 532864.doi: 10.7527/S1000-6893.2025.32864

• Special Issue: Intelligent Processing and Analysis of Aerospace Remote Sensing Images • Previous Articles    

A unified detection model for multimodal aerospace remote sensing images based on mixture of experts

Yuanjie ZHI1, Xin GE1, Fan ZHANG2, Zhi YANG3, Mingyang MA1, Shaohui MEI1()   

  1. 1.School of Electronic Information,Northwestern Polytechnical University,Xi’an 710129,China
    2.China Academy of Launch Vehicle Technology,Beijing 100076,China
    3.State Grid Electric Power Engineering Research Institute Co. ,Ltd. ,Beijing 102209,China
  • Received:2025-10-09 Revised:2025-11-06 Accepted:2025-12-11 Online:2025-12-25 Published:2025-12-23
  • Contact: Shaohui MEI E-mail:meish@nwpu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(62571442)

Abstract:

With the increasing number of remote sensing satellites deployed in orbit in China, the quantity of aerospace remote sensing images, represented by Synthetic Aperture Radar (SAR) and optical (RGB) images, is rapidly growing, along with the demand for tasks such as object detection from these massive datasets. However, due to objective factors such as differences in imaging mechanisms and resolutions, images from different satellites exhibit significant modality feature differences. These differences are particularly pronounced between SAR and RGB remote sensing images, making it difficult for a single model to learn feature information across different types of remote sensing images. As a result, each satellite typically requires a dedicated model for detection tasks, which has become a major obstacle to collaborative recognition and relay detection applications in satellite remote sensing. To address this issue, this paper innovatively proposes a self-distillation multimodal detection model based on a Mixture of Experts (MoE). First, a modality-aware MoE structure is constructed, employing a small number of high-quality experts as teachers to guide other experts, while simultaneously incorporating modality-invariant constraints to further reduce cross-modality feature shifts. Second, a Fourier-enhanced diffusion detection head is developed, combining frequency-domain feature enhancement to improve the capability of capturing detailed information of detection targets. To evaluate the model performance, aerospace images were selected and cropped from the public datasets FAIR1M and SARDet_100K, resulting in a dataset of 68 983 aerospace remote sensing images for object detection under different backgrounds and imaging mechanisms. Experimental results demonstrate that, compared with existing single-modality detection methods, the proposed model performs better in detection tasks across both modalities, with a significant improvement in mean Average Precision (mAP). This fully demonstrates that the proposed model possesses significant application value in multimodal aerospace remote sensing image object detection, and exhibits good adaptability to various types of satellite remote sensing images.

Key words: object detection, multimodal aerospace remote sensing images, Mixture of Experts (MoE), self-distillation, Fourier transform, diffusion model

CLC Number: