随着我国在轨部署的遥感卫星数量逐年增加,以合成孔径雷达(SAR)与光学(RGB)图像为代表的航天遥感图像数量正在快速增多,从这些海量数据中开展目标检测等任务的需求也在快速上升。然而,由于成像机制、分辨率差异等客观因素的限制,不同卫星之间的图像存在着明显的模态特征差异,这种差异在SAR和RGB的遥感图像之间表现得尤为显著,使得单一模型难以学习不同类型遥感图像的特征信息,进而导致每颗卫星都需要相应的专用模型以进行检测任务,这已成为卫星遥感图像协同识别、接力探测应用的主要障碍。针对这一问题,本文创新性地提出基于混合专家网络(MoE)的自蒸馏多模态检测模型。首先,构建基于模态感知的MoE结构,将结构中的少数高质量专家作为教师以指导其他专家:同时结合模态不变性约束,进一步减小跨模态特征偏移。其次,通过构建傅里叶增强扩散检测头,结合频域特征增强,提升了对检测目标的细节捕捉能力。为测试模型性能,本文从公共数据集FAIR1M和SARDet_100K中分别选取其中的航天图像并进行裁剪处理,得到包含不同背景和成像机制下用于目标检测的68983张航天遥感图像的数据集。实验结果显示,与已有的单模态检测方法相比,本文所提模型在两类模态目标检测任务中表现更优,平均精度均值(mAP)有显著提升,这充分证明了本文所提模型在多模态航天遥感图像目标检测上有较好的应用价值,在多类卫星遥感图像上均有较好的适用性。
With the number of remote sensing satellites deployed in orbit in China increasing year by year, the quantity of aerospace remote sensing images, represented by Synthetic Aperture Radar (SAR) and optical (RGB) images, is rapidly growing, and the demand for tasks such as object detection from these massive datasets is also rapidly rising. However, due to objective factors such as imaging mechanisms and resolution differences, there are significant modality feature differences between images from different satellites. This difference is particularly pronounced between SAR and RGB remote sensing images, making it difficult for a single model to learn features from different types of remote sensing images. Consequently, each satellite requires a dedicated model for detection tasks, which has become a major obstacle to collaborative recognition and relay detection applications in satellite remote sensing. To address this issue, this paper innovatively proposes a self-distilled multimodal detection model based on a Mixture of Experts (MoE) network. First, a modality-aware MoE structure is constructed, using a small number of high-quality experts as teachers to guide the other experts, while simultaneously incorporating modality-invariant constraints to further reduce cross-modality feature shifts. Second, a Fourier-enhanced diffusion detection head is constructed and combined with frequency-domain feature enhancement to improve the ability to capture detailed information about detection targets. To evaluate the model's performance, aerospace images were selected and cropped from the public datasets FAIR1M and SARDet_100K, resulting in a dataset of 68,983 aerospace remote sensing images for object detection under different backgrounds and imaging mechanisms. Experimental results show that, compared with existing single-modality detection methods, the proposed model performs better in detection tasks across both modalities, with a significant improvement in mean average precision (mAP). This fully demonstrates that the proposed model has good application value in multimodal aerospace remote sensing image object detection and is well-suited for various types of satellite remote sensing images.
[1] Gui S, Song S, Qin R, et al. Remote sensing object detection in the deep learning era-a review[J]. Remote Sensing, 2024, 16(2): 327.
[2] Delplanque A, Théau J, Foucher S, et al. Wildlife detection, counting and survey using satellite imagery: are we there yet?[J]. GIScience & Remote Sensing, 2024, 61(1): 2348863.
[3] 高志强, 刘纪远. 基于遥感和GIS的中国土地潜力资源的研究[J]. 遥感学报, 2000, 4(2): 136-140.
GAO Z Q, LIU J Y. The research of land potential resources in China based on remote sensing & GIS [J]. Journal of Remote Sensing, 2000, 4(2): 136-140 (in Chinese).
[4] Zheng Z, Zhong Y, Wang J, et al. Building damage assessment for rapid disaster response with a deep object based semantic change detection framework: From natural disasters to man-made disasters[J]. Remote Sensing of Environment, 2021, 265: 112636.
[5] Avtar R, Kouser A, Kumar A, et al. Remote sensing for international peace and security: Its role and implications[J]. Remote Sensing, 2021, 13(3): 439.
[6] Adegun A A, Fonou Dombeu J V, Viriri S, et al. State-of-the-art deep learning methods for objects de-tection in remote sensing satellite images[J]. Sensors, 2023, 23(13): 5849.
[7] Wang L, Mei S, Wang Y, et al. CAMCFormer: Cross-Attention and Multi-Correlation Aided Transformer for Few-Shot Object Detection in Optical Remote Sensing Images[J]. IEEE Transactions on Geoscience and Re-mote Sensing, 2025, 63, 1-16.
[8] Han J, Ding J, Li J, et al. Align deep features for ori-ented object detection[J]. IEEE Transactions on Geo-science and Remote Sensing, 2021, 60: 1-11.
[9] Liu W, Zhou L. Multi-level Denoising for High Quality SAR Object Detection in Complex Scenes[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1-13.
[10] Gao G, Bai Q, Zhang C, et al. Dualistic cascade convolutional neural network dedicated to fully PolSAR im-age ship detection[J]. ISPRS Journal of Photogramme-try and Remote Sensing, 2023, 202: 663-681.
[11] Wang C, Lu W, Li X, et al. M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection[DB/OL]. arXiv preprint: 2505.10931, 2025.
[12] 王子玲, 熊振宇, 顾祥岐. 可见光与SAR多源遥感图像关联学习算法[J]. 航空学报, 2022, 43(S1): 727239.
WANG Z L, XIONG Z Y, GU X Q. Visible light and SAR multi-source remote sensing image correlation learning algorithm[J]. Acta Aeronautica et Astronautica Sinica,2022, 43(S1): 727239 (in Chinese).
[13] Wang Z, Li Y, Chen X, et al. Detecting everything in the open world: Towards universal object detection[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2023: 11433-11443.
[14] Xiong Z, Wang Y, Zhang F, et al. One for all: Toward unified foundation models for Earth vision[C]// IGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2024: 2734-2738.
[15] Li Y, Li X, Li Y, et al. SM3Det: A unified model for multi-modal remote sensing object detection[J]. arXiv preprint: 2412.20665, 2024.
[16] Li Y, Li X, Li W, et al. Sardet-100k: Towards open-source benchmark and toolkit for large-scale sar object detection[J]. Advances in Neural Information Pro-cessing Systems, 2024, 37: 128430-128461.
[17] Sun X, Wang P, Yan Z, et al. FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 184: 116-130.
[18] Li W, Zhao D, Yuan B, et al. PETDet: Proposal en-hancement for two-stage fine-grained object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 62: 1-14.
[19] Hou X, Liu M, Zhang S, et al. Relation detr: Exploring explicit position relation prior for object detection[C]// European Conference on Computer Vision (ECCV). Cham: Springer Nature Switzerland, 2024: 89-105.
[20] Zhao J, Ding Z, Zhou Y, et al. OrientedFormer: An end-to-end transformer-based oriented object detector in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1-16.
[21] Dai Y, Zou M, Li Y, et al. Denodet: Attention as deformable multi-subspace feature denoising for target detection in sar images[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 61: 4729-4743.
[22] Zhou J, Xiao C, Peng B, et al. DiffDet4SAR: Diffusion-based aircraft target detection network for SAR images[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 1-5.
[23] Li W, Yang W, Hou Y, et al. SARATR-X: Towards building a foundation model for SAR target recognition[J]. IEEE Transactions on Image Processing, 2025, 34: 869-884.