RS-AdaDiff：基于降质感知自适应估计的单步遥感图像超分辨率扩散模型

王飞; 刘勇; 姚嘉伟; 朱轩磊; 卢孝强; 郭文星; 张雪涛; 郭宇

doi:10.7527/S1000-6893.2025.32763

航空学报 >

2025 , Vol. 46 >Issue 23: 632763 - 632763

DOI: https://doi.org/10.7527/S1000-6893.2025.32763

干扰环境下无人机多源感知专栏

RS-AdaDiff：基于降质感知自适应估计的单步遥感图像超分辨率扩散模型

王飞 ,
刘勇 ,
姚嘉伟 ,
朱轩磊 ,
卢孝强 ,
郭文星 ,
张雪涛 ,
郭宇

展开

^1.西安交通大学人机混合增强智能全国重点实验室，西安 710049
^2.西安交通大学视觉信息与应用国家工程研究中心，西安 710049
^3.西安交通大学人工智能与机器人研究所，西安 710049
^4.福州大学物理与信息工程学院，福州 350108

．E-mail： yu.guo@xjtu.edu.cn

收稿日期: 2025-09-06

修回日期: 2025-09-24

录用日期: 2025-10-20

网络出版日期: 2025-11-13

基金资助

国家重大科技专项(2009XJTU0016)

收起

RS-AdaDiff: One-step remote sensing image super-resolution diffusion model with degradation-aware adaptive estimation

Fei WANG ,
Yong LIU ,
Jiawei YAO ,
Xuanlei ZHU ,
Xiaoqiang LU ,
Wenxing GUO ,
Xuetao ZHANG ,
Yu GUO

Expand

^1.National Key Laboratory of Human-Machine Hybrid Augmented Intelligence，Xi’an Jiaotong University，Xi’an 710049，China
^2.National Engineering Research Center of Visual Information and Applications，Xi’an Jiaotong University，Xi’an 710049，China
^3.Institute of Artificial Intelligence and Robotics，Xi’an Jiaotong University，Xi’an 710049，China
^4.College of Physics and Information Engineering，Fuzhou University，Fuzhou 350108，China

E-mail： yu.guo@xjtu.edu.cn

Received date: 2025-09-06

Revised date: 2025-09-24

Accepted date: 2025-10-20

Online published: 2025-11-13

Supported by

National Major Science and Technology Projects of China(2009XJTU0016)

Fold

摘要

扩散模型在生成逼真图像细节方面展现出巨大潜力。然而，现有的扩散模型主要基于自然图像进行训练，将这些模型应用于遥感图像超分辨率任务仍然面临巨大挑战。此外，这些模型在推理时需要数十或上百次的迭代采样，导致计算成本高昂，并限制了它们在实际应用中的适用性。为此，提出一种基于降质感知自适应估计的单步遥感图像超分辨率扩散模型（RS-AdaDiff），兼顾重建性能与推理效率。具体而言，提出了一个基于降质感知的时间步估计模块，可通过估计输入图像退化程度的扩散模型自适应估计扩散时间步，从而将迭代去噪过程重构为从低分辨率到高分辨率图像的单步重建过程，大幅加快推理速度。同时，将可训练的轻量LoRA网络层集成到预先训练的扩散模型中，并利用遥感图像数据集对其进行微调，以消除数据分布差异造成的领域差距问题。此外，为了充分利用预训练模型的图像先验，引入了分布对比匹配蒸馏。通过KL散度正则化，使重建的超分图像在特征空间中更接近高分辨率图像并远离低分辨率图像，从而提升生成质量。最后，还提出特征-边缘联合感知相似度损失，以增强结构信息的感知能力，改善边缘模糊和纹理失真问题。大量实验结果表明：提出的RS-AdaDiff在多个公开遥感数据集上均优于现有先进方法，在定量指标和视觉质量方面均取得显著提升，能够生成结构清晰、细节丰富的超分辨率遥感图像。

关键词： 遥感图像超分辨率; 扩散模型; 自适应估计; 计算机视觉; 航空航天

本文引用格式

王飞 , 刘勇 , 姚嘉伟 , 朱轩磊 , 卢孝强 , 郭文星 , 张雪涛 , 郭宇 . RS-AdaDiff：基于降质感知自适应估计的单步遥感图像超分辨率扩散模型[J]. 航空学报, 2025 , 46(23) : 632763 -632763 . DOI: 10.7527/S1000-6893.2025.32763

Abstract

Diffusion models have demonstrated great potential in generating realistic image details. However， existing diffusion models are primarily trained on natural images， making their application to remote sensing image super-resolution highly challenging. Moreover， these models typically require dozens or even hundreds of iterative sampling steps during inference， resulting in high computational costs and limited practicality. To address these issues， this paper proposes a degradation-aware adaptive estimation-based single-step remote sensing image super-resolution diffusion model （RS-AdaDiff）， which balances reconstruction performance and inference efficiency. Specifically， we propose a degradation-aware timestep estimation module that adaptively estimates the diffusion timestep for the diffusion model by assessing the degradation level of the input image. This approach reconstructs the iterative denoising process into a single-step reconstruction from low-resolution to high-resolution images， thereby significantly accelerating inference. Meanwhile， we integrate trainable lightweight LoRA layers into a pre-trained diffusion model and fine-tune it on a remote sensing image dataset to mitigate the domain gap caused by data distribution differences. Additionally， to fully leverage the image priors of the pre-trained model， we introduce distribution contrastive matching distillation. By regularizing the KL divergence， the reconstructed super-resolved images are brought closer to high-resolution images and farther from low-resolution images in the feature space， thereby improving generation quality. Finally， we propose a feature-edge joint perceptual similarity loss to enhance the perception of structural information and mitigate issues such as edge blur and texture distortion. Extensive experimental results demonstrate that the proposed RS-AdaDiff outperforms existing state-of-the-art methods on multiple public remote sensing datasets， achieving significant improvements in both quantitative metrics and visual quality， and producing super-resolved remote sensing images with clearer structures and richer details.

Key words： remote sensing image super-resolution; diffusion model; adaptive estimation; computer vision; aerospace

参考文献

[1]	BANDARA W G C， NAIR N G， PATEL V M. DDPM-CD： Remote sensing change detection using denoising diffusion probabilistic models［DB/OL］. arXiv preprint： 2206.11892， 2022.
[2]	赵军利，李向英，陈占龙，等. 基于遥感影像军事地质信息提取及应用研究现状［J］. 地质论评， 2025， 71（3）： 848-866.
	ZHAO J L， LI X Y， CHEN Z L， et al. Current research status on the extraction and application of military geological information based on remote sensing images［J］. Geological Review， 2025， 71（3）： 848-866 （in Chinese）.
[3]	秦杨，黄孝森. 遥感技术在全域土地综合整治中的应用［J］. 智能建筑与智慧城市， 2025（5）： 43-45.
	QIN Y， HUANG X S. The application of remote sensing technology in the whole land comprehensive consolidation［J］. Intelligent Building & Smart City， 2025（5）： 43-45 （in Chinese）.
[4]	刘延芳，佘佳宇，袁秋帆，等. 无人机遥感图像实时小目标检测方法［J］. 航空学报， 2024， 45（14）： 630119.
	LIU Y F， SHE J Y， YUAN Q F， et al. Real-time small target detection networks for UAV remote sensing［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（14）： 630119 （in Chinese）.
[5]	王子玲，熊振宇，杨璐铖，等. AIS和光学遥感图像引导的星载SAR舰船目标识别网络［J］. 航空学报， 2024， 45（2）： 328672.
	WANG Z L， XIONG Z Y， YANG L C， et al. Spaceborne SAR ship target recognition network guided by AIS and optical remote sensing images［J］. Acta Aeronautica et Astronautica Sinica， 2024， 45（2）： 328672 （in Chinese）.
[6]	LEI S， SHI Z W， ZOU Z X. Super-resolution for remote sensing images via local-global combined network［J］. IEEE Geoscience and Remote Sensing Letters， 2017， 14（8）： 1243-1247.
[7]	LIM B， SON S， KIM H， et al. Enhanced deep residual networks for single image super-resolution［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2017： 1132-1140.
[8]	LI Y D， MAVROMATIS S， ZHANG F， et al. Single-image super-resolution for remote sensing images using a deep generative adversarial network with local and global attention mechanisms［J］. IEEE Transactions on Geoscience and Remote Sensing， 2021， 60： 3000224.
[9]	LEDIG C， THEIS L， HUSZáR F， et al. Photo-realistic single image super-resolution using a generative adversarial network［C］∥2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 105-114.
[10]	DHARIWAL P， NICHOL A. Diffusion models beat GANS on image synthesis［J］. Advances in Neural Information Processing Systems， 2021， 34， 8780-8794.
[11]	YANG L， LIU J， HONG S， et al. Improving diffusion-based image synthesis with context prediction［C］∥Proceedings of the 37th International Conference on Neural Information Processing Systems， 2024.
[12]	HU E J， SHEN Y， WALLIS P， et al. Lora： Low-rank adaptation of large language models［C］∥International Conference on Learning Representations 2022.
[13]	ZHANG S， YUAN Q Q， LI J， et al. Scene-adaptive remote sensing image super-resolution using a multiscale attention network［J］. IEEE Transactions on Geoscience and Remote Sensing， 2020， 58（7）： 4764-4779.
[14]	PAN Z X， MA W， GUO J Y， et al. Super-resolution of single remote sensing image based on residual dense backprojection networks［J］. IEEE Transactions on Geoscience and Remote Sensing， 2019， 57（10）： 7918-7933.
[15]	XIAO Y， SU X， YUAN Q Q， et al. Satellite video super-resolution via multiscale deformable convolution alignment and temporal grouping projection［J］. IEEE Transactions on Geoscience and Remote Sensing， 2021， 60： 5610819.
[16]	LIU Z， LIN Y T， CAO Y， et al. Swin transformer： Hierarchical vision transformer using shifted windows［C］∥Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021： 10012-10022.
[17]	XU Y Y， LUO W， HU A N， et al. TE-SAGAN： An improved generative adversarial network for remote sensing super-resolution images［J］. Remote Sensing， 2022， 14（10）： 2425.
[18]	HO J， JAIN A， ABBEEL P. Denoising diffusion probabilistic models［J］. Advances in neural information processing systems， 2020， 33： 6840-6851.
[19]	LIU J Z， YUAN Z Q， PAN Z Y， et al. Diffusion model with detail complement for super-resolution of remote sensing［J］. Remote Sensing， 2022， 14（19）： 4834.
[20]	付奕博，谢东海，王志博，等. 基于条件控制扩散模型的遥感图像超分辨率增强算法［J］. 地球信息科学学报， 2024， 26（10）： 2384-2393.
	FU Y B， XIE D H， WANG Z B， et al. A super-resolution enhancement algorithm for remote sensing images using conditional controlled diffusion models［J］. Journal of Geo-Information Science， 2024， 26（10）： 2384-2393 （in Chinese）.
[21]	HAN L T， ZHAO Y C， LV H Y， et al. Enhancing remote sensing image super-resolution with efficient hybrid conditional diffusion model［J］. Remote Sensing， 2023， 15（13）： 3452.
[22]	XIAO Y， YUAN Q Q， JIANG K， et al. EDiffSR： An efficient diffusion probabilistic model for remote sensing image super-resolution［J］. IEEE Transactions on Geoscience and Remote Sensing， 2023， 62： 5601514.
[23]	ALI A M， BENJDIRA B， KOUBAA A， et al. TESR： Two-stage approach for enhancement and super-resolution of remote sensing images［J］. Remote Sensing， 2023， 15（9）： 2346.
[24]	ROMBACH R， BLATTMANN A， LORENZ D， et al. High-resolution image synthesis with latent diffusion models［C］∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2022： 10674-10685.
[25]	LI X W， SUN A T， ZHAO M K， et al. Multi-intention oriented contrastive learning for sequential recommendation［C］∥Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. New York： ACM， 2023： 411-419.
[26]	YE M， ZHANG X， YUEN P C， et al. Unsupervised embedding learning via invariant and spreading instance feature［C］∥2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2019： 6203-6212.
[27]	WU H Y， QU Y Y， LIN S H， et al. Contrastive learning for compact single image dehazing［C］∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2021： 10551-10560.
[28]	WANG Z， LU C， WANG Y， et al. Prolificdreamer： High-fidelity and diverse text-to-3D generation with variational score distillation［J］. Advances in Neural Information Processing Systems， 2023， 36： 8406-8441.
[29]	YIN T W， GHARBI M， ZHANG R， et al. One-step diffusion with distribution matching distillation［C］∥2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2024： 6613-6623.
[30]	POOLE B， JAIN A， BARRON J T， et al. Dreamfusion： Text-to-3d using 2D diffusion［J］. arXiv preprint： 2209.14988， 2022.
[31]	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint： 1409.1556， 2014.
[32]	DING K Y， MA K D， WANG S Q， et al. Image quality assessment： Unifying structure and texture similarity［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（5）： 2567-2581.
[33]	LI J， CAO J， ZOU Z， et al. Unleashing the power of one-step diffusion based image super-resolution via a large-scale diffusion discriminator［DB/OL］. arXiv preprint： 2410.04224， 2024.
[34]	DING J， XUE N， XIA G S， et al. Object detection in aerial images： A large-scale benchmark and challenges［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2022， 44（11）： 7778-7796.
[35]	XIA G S， HU J W， HU F， et al. AID： A benchmark data set for performance evaluation of aerial scene classification［J］. IEEE Transactions on Geoscience and Remote Sensing， 2017， 55（7）： 3965-3981.
[36]	CHENG G， ZHOU P C， HAN J W. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images［J］. IEEE Transactions on Geoscience and Remote Sensing， 2016， 54（12）： 7405-7415.
[37]	ROTTENSTEINER F， SOHN G， JUNG J， et al. The ISPRS benchmark on urban object classification and 3D building reconstruction［J］. ISPRS Annals of Photogrammetry， Remote Sensing and Spatial Information Sciences， 2012（3）： 293-298.
[38]	LONG Y， GONG Y P， XIAO Z F， et al. Accurate object localization in remote sensing images based on convolutional neural networks［J］. IEEE Transactions on Geoscience and Remote Sensing， 2017， 55（5）： 2486-2498.
[39]	WANG X T， XIE L B， DONG C， et al. Real-ESRGAN： Training real-world blind super-resolution with pure synthetic data［C］∥2021 IEEE/CVF International Conference on Computer Vision Workshops （ICCVW）. Piscataway： IEEE Press， 2021： 1905-1914.
[40]	KINGMA DP. Adam： A method for stochastic optimization［DB/OL］. arXiv preprint： 1412.6980， 2014.
[41]	HEUSEL M， RAMSAUER H， UNTERTHINER T， et al. Gans trained by a two time-scale update rule converge to a local Nash equilibrium［J］. Advances in Neural Information Processing Systems， 2017， 30， 6626-6637.
[42]	ZHANG R， ISOLA P， EFROS A A， et al. The unreasonable effectiveness of deep features as a perceptual metric［C］∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 586-595.
[43]	MITTAL A， SOUNDARARAJAN R， BOVIK A C. Making a “completely blind” image quality analyzer［J］. IEEE Signal Processing Letters， 2013， 20（3）： 209-212.
[44]	KE J J， WANG Q F， WANG Y L， et al. MUSIQ： Multi-scale image quality transformer［C］∥2021 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2021： 5128-5137.
[45]	WANG J Y， CHAN K C K， LOY C C. Exploring CLIP for assessing the look and feel of images［C］∥Proceedings of the AAAI Conference on Artificial Intelligence， 2023.
[46]	YANG S D， WU T H， SHI S W， et al. MANIQA： Multi-dimension attention network for No-reference image quality assessment［C］∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2022： 1190-1199.
[47]	SAHARIA C， HO J， CHAN W， et al. Image super-resolution via iterative refinement［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2023， 45（4）： 4713-4726.
[48]	ZHU C， LIU Y， HUANG S， et al. Taming a diffusion model to revitalize remote sensing image super-resolution［J］. Remote Sensing， 2025， 17（8）： 1348.
[49]	WANG J， FAN Q， ZHANG Q， et al. Hero-SR： One-step diffusion for super-resolution with human perception priors［J］. arXiv preprint： 2412.07152， 2024.
[50]	SHI S W， BAI Q Y， CAO M D， et al. Region-adaptive deformable network for image quality assessment［C］∥2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2021： 324-333.
[51]	LEI S， SHI Z W. Hybrid-scale self-similarity exploitation for remote sensing image super-resolution［J］. IEEE Transactions on Geoscience and Remote Sensing， 2021， 60： 5401410.
[52]	XIAO Y， YUAN Q Q， JIANG K， et al. TTST： A top-k token selective transformer for remote sensing image super-resolution［J］. IEEE Transactions on Image Processing， 2024， 33： 738-752.
[53]	LIANG J Y， CAO J Z， SUN G L， et al. SwinIR： image restoration using swin transformer［C］∥2021 IEEE/CVF International Conference on Computer Vision Workshops （ICCVW）. Piscataway： IEEE Press， 2021： 1833-1844.
[54]	LEI S， SHI Z W， MO W J. Transformer-based multistage enhancement for remote sensing image super-resolution［J］. IEEE Transactions on Geoscience and Remote Sensing， 2021， 60： 5615611.
[55]	MENG F N， CHEN Y J， JING H Y， et al. A conditional diffusion model with fast sampling strategy for remote sensing image super-resolution［J］. IEEE Transactions on Geoscience and Remote Sensing， 2024， 62： 5408616.
[56]	LIN X Q， HE J W， CHEN Z Y， et al. DiffBIR： Toward blind image restoration withGenerative diffusion prior［C］∥Computer Vision-ECCV 2024. Cham： Springer， 2025： 430-448.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献