一种基于混合专家组的多模态航天遥感图像统一检测模型-航天遥感图像智能处理与分析

doi:10.7527/S1000-6893.2025.32864

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

一种基于混合专家组的多模态航天遥感图像统一检测模型-航天遥感图像智能处理与分析

支元杰¹,葛欣¹,张帆²,杨知³,马明阳¹,梅少辉¹

1. 西北工业大学
2. 中国运载火箭技术研究院
3. 国网电力工程研究院有限公司

收稿日期:2025-10-09 修回日期:2025-12-22 出版日期:2025-12-23 发布日期:2025-12-23
通讯作者: 梅少辉
基金资助:
国家自然科学基金

A Unified Detection Model for Multimodal Aerospace Remote Sensing Images Based on Mixture of Experts

Received:2025-10-09 Revised:2025-12-22 Online:2025-12-23 Published:2025-12-23
Contact: Shao-Hui MEI
Supported by:
National Natural Science Foundation of China

摘要/Abstract

摘要： 随着我国在轨部署的遥感卫星数量逐年增加，以合成孔径雷达（SAR）与光学（RGB）图像为代表的航天遥感图像数量正在快速增多，从这些海量数据中开展目标检测等任务的需求也在快速上升。然而，由于成像机制、分辨率差异等客观因素的限制，不同卫星之间的图像存在着明显的模态特征差异，这种差异在SAR和RGB的遥感图像之间表现得尤为显著，使得单一模型难以学习不同类型遥感图像的特征信息，进而导致每颗卫星都需要相应的专用模型以进行检测任务，这已成为卫星遥感图像协同识别、接力探测应用的主要障碍。针对这一问题，本文创新性地提出基于混合专家网络（MoE）的自蒸馏多模态检测模型。首先，构建基于模态感知的MoE结构，将结构中的少数高质量专家作为教师以指导其他专家：同时结合模态不变性约束，进一步减小跨模态特征偏移。其次，通过构建傅里叶增强扩散检测头，结合频域特征增强，提升了对检测目标的细节捕捉能力。为测试模型性能，本文从公共数据集FAIR1M和SARDet_100K中分别选取其中的航天图像并进行裁剪处理，得到包含不同背景和成像机制下用于目标检测的68983张航天遥感图像的数据集。实验结果显示，与已有的单模态检测方法相比，本文所提模型在两类模态目标检测任务中表现更优，平均精度均值（mAP）有显著提升，这充分证明了本文所提模型在多模态航天遥感图像目标检测上有较好的应用价值，在多类卫星遥感图像上均有较好的适用性。

关键词: 目标检测, 多模态航天遥感图像, 混合专家组, 自蒸馏, 傅里叶变换, 扩散模型

Abstract: With the number of remote sensing satellites deployed in orbit in China increasing year by year, the quantity of aerospace remote sensing images, represented by Synthetic Aperture Radar (SAR) and optical (RGB) images, is rapidly growing, and the demand for tasks such as object detection from these massive datasets is also rapidly rising. However, due to objective factors such as imaging mechanisms and resolution differences, there are significant modality feature differences between images from different satellites. This difference is particularly pronounced between SAR and RGB remote sensing images, making it difficult for a single model to learn features from different types of remote sensing images. Consequently, each satellite requires a dedicated model for detection tasks, which has become a major obstacle to collaborative recognition and relay detection applications in satellite remote sensing. To address this issue, this paper innovatively proposes a self-distilled multimodal detection model based on a Mixture of Experts (MoE) network. First, a modality-aware MoE structure is constructed, using a small number of high-quality experts as teachers to guide the other experts, while simultaneously incorporating modality-invariant constraints to further reduce cross-modality feature shifts. Second, a Fourier-enhanced diffusion detection head is constructed and combined with frequency-domain feature enhancement to improve the ability to capture detailed information about detection targets. To evaluate the model's performance, aerospace images were selected and cropped from the public datasets FAIR1M and SARDet_100K, resulting in a dataset of 68,983 aerospace remote sensing images for object detection under different backgrounds and imaging mechanisms. Experimental results show that, compared with existing single-modality detection methods, the proposed model performs better in detection tasks across both modalities, with a significant improvement in mean average precision (mAP). This fully demonstrates that the proposed model has good application value in multimodal aerospace remote sensing image object detection and is well-suited for various types of satellite remote sensing images.

Key words: Object detection, multimodal aerospace remote sensing images, Mixture of Experts, self-distillation, Fourier transform, diffusion model

中图分类号:

支元杰葛欣张帆杨知马明阳梅少辉. 一种基于混合专家组的多模态航天遥感图像统一检测模型-航天遥感图像智能处理与分析[J]. 航空学报, doi: 10.7527/S1000-6893.2025.32864.

参考文献

[1] Gui S, Song S, Qin R, et al. Remote sensing object detection in the deep learning era-a review[J]. Remote Sensing, 2024, 16(2): 327.
[2] Delplanque A, Théau J, Foucher S, et al. Wildlife detection, counting and survey using satellite imagery: are we there yet?[J]. GIScience & Remote Sensing, 2024, 61(1): 2348863.
[3] 高志强, 刘纪远. 基于遥感和GIS的中国土地潜力资源的研究[J]. 遥感学报, 2000, 4(2): 136-140.
GAO Z Q, LIU J Y. The research of land potential resources in China based on remote sensing ＆ GIS [J]. Journal of Remote Sensing, 2000, 4(2): 136-140 (in Chinese).
[4] Zheng Z, Zhong Y, Wang J, et al. Building damage assessment for rapid disaster response with a deep object based semantic change detection framework: From natural disasters to man-made disasters[J]. Remote Sensing of Environment, 2021, 265: 112636.
[5] Avtar R, Kouser A, Kumar A, et al. Remote sensing for international peace and security: Its role and implications[J]. Remote Sensing, 2021, 13(3): 439.
[6] Adegun A A, Fonou Dombeu J V, Viriri S, et al. State-of-the-art deep learning methods for objects de-tection in remote sensing satellite images[J]. Sensors, 2023, 23(13): 5849.
[7] Wang L, Mei S, Wang Y, et al. CAMCFormer: Cross-Attention and Multi-Correlation Aided Transformer for Few-Shot Object Detection in Optical Remote Sensing Images[J]. IEEE Transactions on Geoscience and Re-mote Sensing, 2025, 63, 1-16.
[8] Han J, Ding J, Li J, et al. Align deep features for ori-ented object detection[J]. IEEE Transactions on Geo-science and Remote Sensing, 2021, 60: 1-11.
[9] Liu W, Zhou L. Multi-level Denoising for High Quality SAR Object Detection in Complex Scenes[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1-13.
[10] Gao G, Bai Q, Zhang C, et al. Dualistic cascade convolutional neural network dedicated to fully PolSAR im-age ship detection[J]. ISPRS Journal of Photogramme-try and Remote Sensing, 2023, 202: 663-681.
[11] Wang C, Lu W, Li X, et al. M4-SAR: A Multi-Resolution, Multi-Polarization, Multi-Scene, Multi-Source Dataset and Benchmark for Optical-SAR Fusion Object Detection[DB/OL]. arXiv preprint: 2505.10931, 2025.
[12] 王子玲, 熊振宇, 顾祥岐. 可见光与SAR多源遥感图像关联学习算法[J]. 航空学报, 2022, 43(S1): 727239.
WANG Z L, XIONG Z Y, GU X Q. Visible light and SAR multi-source remote sensing image correlation learning algorithm[J]. Acta Aeronautica et Astronautica Sinica,2022, 43(S1): 727239 (in Chinese).
[13] Wang Z, Li Y, Chen X, et al. Detecting everything in the open world: Towards universal object detection[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2023: 11433-11443.
[14] Xiong Z, Wang Y, Zhang F, et al. One for all: Toward unified foundation models for Earth vision[C]// IGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 2024: 2734-2738.
[15] Li Y, Li X, Li Y, et al. SM3Det: A unified model for multi-modal remote sensing object detection[J]. arXiv preprint: 2412.20665, 2024.
[16] Li Y, Li X, Li W, et al. Sardet-100k: Towards open-source benchmark and toolkit for large-scale sar object detection[J]. Advances in Neural Information Pro-cessing Systems, 2024, 37: 128430-128461.
[17] Sun X, Wang P, Yan Z, et al. FAIR1M: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2022, 184: 116-130.
[18] Li W, Zhao D, Yuan B, et al. PETDet: Proposal en-hancement for two-stage fine-grained object detection[J]. IEEE Transactions on Geoscience and Remote Sensing, 2023, 62: 1-14.
[19] Hou X, Liu M, Zhang S, et al. Relation detr: Exploring explicit position relation prior for object detection[C]// European Conference on Computer Vision (ECCV). Cham: Springer Nature Switzerland, 2024: 89-105.
[20] Zhao J, Ding Z, Zhou Y, et al. OrientedFormer: An end-to-end transformer-based oriented object detector in remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2024, 62: 1-16.
[21] Dai Y, Zou M, Li Y, et al. Denodet: Attention as deformable multi-subspace feature denoising for target detection in sar images[J]. IEEE Transactions on Aerospace and Electronic Systems, 2024, 61: 4729-4743.
[22] Zhou J, Xiao C, Peng B, et al. DiffDet4SAR: Diffusion-based aircraft target detection network for SAR images[J]. IEEE Geoscience and Remote Sensing Letters, 2024, 21: 1-5.
[23] Li W, Yang W, Hou Y, et al. SARATR-X: Towards building a foundation model for SAR target recognition[J]. IEEE Transactions on Image Processing, 2025, 34: 869-884.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

一种基于混合专家组的多模态航天遥感图像统一检测模型-航天遥感图像智能处理与分析

A Unified Detection Model for Multimodal Aerospace Remote Sensing Images Based on Mixture of Experts

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价

[1]	吴一全, 童康. 基于深度学习的无人机航拍图像小目标检测研究进展[J]. 航空学报, 2025, 46(3): 30848-030848.
[2]	钟帅, 王丽萍. MCS-RETR：改进RT-DETR的无人机航拍图像目标检测方法[J]. 航空学报, 2025, 46(22): 331987-331987.
[3]	郑忆, 程向红, 唐兴邦, 曹毅. 基于改进ReDet的航拍绝缘子及其缺陷定向检测算法[J]. 航空学报, 2025, 46(18): 331825-331825.
[4]	杨永刚, 姜文韬, 高志云. 低空无人机实时目标检测算法[J]. 航空学报, 2025, 46(16): 331619-331619.
[5]	孟凡腾, 秦勇, 崔京, 吴云鹏, 张紫城, 魏少伟. 铁路外部环境无人机图像未知风险检测方法[J]. 航空学报, 2025, 46(11): 531262-531262.
[6]	陈树生, 贾苜梁, 林家豪, 金世轶, 高正红, 王岳青, 马志强, 李铮, 段辰龙, 李佳伟. 生成式模型赋能飞行器技术应用研究进展与展望[J]. 航空学报, 2025, 46(10): 631194-631194.
[7]	王景, 柳位, 谢海润, 张淼, 马涂亮. 扩散模型驱动的超临界翼型多目标生成式设计[J]. 航空学报, 2025, 46(10): 631210-631210.
[8]	张睿韬, 王聪, 陶俊, 王立悦, 孙刚. 基于潜在扩散模型的翼型参数化方法[J]. 航空学报, 2025, 46(10): 631180-631180.
[9]	薛有涛, 尧少波, 杨雨欣, 段毅, 赵文文, 李昊歌. 基于生成式模型的三维飞行器外形泛化表征方法[J]. 航空学报, 2025, 46(10): 631511-631511.
[10]	贺靖, 谭鸽伟. 基于ALPFT参数估计的运动目标高分辨率成像[J]. 航空学报, 2025, 46(1): 330502-330502.
[11]	罗旭东, 吴一全, 陈金林. 无人机航拍影像目标检测与语义分割的深度学习方法研究进展[J]. 航空学报, 2024, 45(6): 28822-028822.
[12]	樊云翔, 艾化楠, 王明振, 曹楷, 刘学军, 吕宏强. 基于深度学习的水上飞机非定常水载荷重构[J]. 航空学报, 2024, 45(20): 129882-129882.
[13]	李少毅, 卫孟杰, 杨俊彦, 杨曦, 孟中杰. 红外多波段成像末制导技术研究现状与展望[J]. 航空学报, 2024, 45(20): 630427-630427.
[14]	李少毅, 张雅淇, 程岳, 杨曦, 张良, 林健, 孟中杰. 场景抽象语义综合模型及其在红外弱小目标检测中的应用[J]. 航空学报, 2024, 45(20): 630702-630702.
[15]	于傲泽, 魏维伟, 王平, 张金强, 柯文雄. 基于分块复合注意力的无人机小目标检测算法[J]. 航空学报, 2024, 45(14): 629148-629148.