首页 >

一种基于混合专家组的多模态航天遥感图像统一检测模型-航天遥感图像智能处理与分析

支元杰1,葛欣1,张帆2,杨知3,马明阳1,梅少辉1   

  1. 1. 西北工业大学
    2. 中国运载火箭技术研究院
    3. 国网电力工程研究院有限公司
  • 收稿日期:2025-10-09 修回日期:2025-12-22 出版日期:2025-12-23 发布日期:2025-12-23
  • 通讯作者: 梅少辉
  • 基金资助:
    国家自然科学基金

A Unified Detection Model for Multimodal Aerospace Remote Sensing Images Based on Mixture of Experts

  • Received:2025-10-09 Revised:2025-12-22 Online:2025-12-23 Published:2025-12-23
  • Contact: Shao-Hui MEI
  • Supported by:
    National Natural Science Foundation of China

摘要: 随着我国在轨部署的遥感卫星数量逐年增加,以合成孔径雷达(SAR)与光学(RGB)图像为代表的航天遥感图像数量正在快速增多,从这些海量数据中开展目标检测等任务的需求也在快速上升。然而,由于成像机制、分辨率差异等客观因素的限制,不同卫星之间的图像存在着明显的模态特征差异,这种差异在SAR和RGB的遥感图像之间表现得尤为显著,使得单一模型难以学习不同类型遥感图像的特征信息,进而导致每颗卫星都需要相应的专用模型以进行检测任务,这已成为卫星遥感图像协同识别、接力探测应用的主要障碍。针对这一问题,本文创新性地提出基于混合专家网络(MoE)的自蒸馏多模态检测模型。首先,构建基于模态感知的MoE结构,将结构中的少数高质量专家作为教师以指导其他专家:同时结合模态不变性约束,进一步减小跨模态特征偏移。其次,通过构建傅里叶增强扩散检测头,结合频域特征增强,提升了对检测目标的细节捕捉能力。为测试模型性能,本文从公共数据集FAIR1M和SARDet_100K中分别选取其中的航天图像并进行裁剪处理,得到包含不同背景和成像机制下用于目标检测的68983张航天遥感图像的数据集。实验结果显示,与已有的单模态检测方法相比,本文所提模型在两类模态目标检测任务中表现更优,平均精度均值(mAP)有显著提升,这充分证明了本文所提模型在多模态航天遥感图像目标检测上有较好的应用价值,在多类卫星遥感图像上均有较好的适用性。

关键词: 目标检测, 多模态航天遥感图像, 混合专家组, 自蒸馏, 傅里叶变换, 扩散模型

Abstract: With the number of remote sensing satellites deployed in orbit in China increasing year by year, the quantity of aerospace remote sensing images, represented by Synthetic Aperture Radar (SAR) and optical (RGB) images, is rapidly growing, and the demand for tasks such as object detection from these massive datasets is also rapidly rising. However, due to objective factors such as imaging mechanisms and resolution differences, there are significant modality feature differences between images from different satellites. This difference is particularly pronounced between SAR and RGB remote sensing images, making it difficult for a single model to learn features from different types of remote sensing images. Consequently, each satellite requires a dedicated model for detection tasks, which has become a major obstacle to collaborative recognition and relay detection applications in satellite remote sensing. To address this issue, this paper innovatively proposes a self-distilled multimodal detection model based on a Mixture of Experts (MoE) network. First, a modality-aware MoE structure is constructed, using a small number of high-quality experts as teachers to guide the other experts, while simultaneously incorporating modality-invariant constraints to further reduce cross-modality feature shifts. Second, a Fourier-enhanced diffusion detection head is constructed and combined with frequency-domain feature enhancement to improve the ability to capture detailed information about detection targets. To evaluate the model's performance, aerospace images were selected and cropped from the public datasets FAIR1M and SARDet_100K, resulting in a dataset of 68,983 aerospace remote sensing images for object detection under different backgrounds and imaging mechanisms. Experimental results show that, compared with existing single-modality detection methods, the proposed model performs better in detection tasks across both modalities, with a significant improvement in mean average precision (mAP). This fully demonstrates that the proposed model has good application value in multimodal aerospace remote sensing image object detection and is well-suited for various types of satellite remote sensing images.

Key words: Object detection, multimodal aerospace remote sensing images, Mixture of Experts, self-distillation, Fourier transform, diffusion model

中图分类号: