小波时频局部化无人机目标检测模型压缩研究

doi:10.7527/S1000-6893.2025.31952

干扰环境下无人机多源感知专栏

本期目录 | 过刊浏览 | 高级检索

前一篇 |

小波时频局部化无人机目标检测模型压缩研究

黄维, 潘家皓, 何楚()

武汉大学电子信息学院，武汉 430072

收稿日期:2025-03-10 修回日期:2025-03-28 接受日期:2025-04-25 出版日期:2025-05-23 发布日期:2025-05-08
通讯作者: 何楚 E-mail:chuhe@whu.edu.cn
基金资助:
国家重点研发计划(2016YFC0803000);国家自然科学基金(41371342)

Wavelet time-frequency localization-based model compression for UAV object detection

Wei HUANG, Jiahao PAN, Chu HE()

School of Electronic Information，Wuhan University，Wuhan 430072，China

Received:2025-03-10 Revised:2025-03-28 Accepted:2025-04-25 Online:2025-05-23 Published:2025-05-08
Contact: Chu HE E-mail:chuhe@whu.edu.cn
Supported by:
National Key Research and Development Program of China(2016YFC0803000);National Natural Science Foundation of China(41371342)

摘要/Abstract

摘要：

基于深度学习的遥感目标检测模型依托地面服务器强大的算力与存储资源，已具备高效处理海量遥感数据的能力。然而，在无人机等移动边缘设备部署场景中，受限于计算资源、存储容量与能耗约束，大规模模型难以实现有效部署。模型压缩，如轻量化设计、模型量化等，已成为推动遥感目标检测算法落地的重要技术。以单阶段的目标检测深度网络为基准，提出一种基于小波时频局部化的模型压缩框架：通过深度可分离卷积与离散小波变换的空频局部化特性相融合，构建具有扩展感受野的小波深度可分离卷积（W-DSConv）模块，实现了模型的轻量化重构；基于小波频域分解特性，提出小波分频量化（W-FDQ）的量化感知训练方法，实现不同频段特征的独立量化，进一步完成对轻量化模型的压缩。实验中选取了YOLO系列的网络模型进行验证，在VisDrone2021无人机遥感数据集上的实验结果表明：所提W-DSConv模块在参数量减少45.5%、计算量减少37.4%的情况下，检测精度波动幅度控制在2.2%以内；采用W-FDQ方法进行6 bit与4 bit量化时，量化模型分别保持了浮点模型的95.8%与92.9%的检测性能。为移动端遥感目标检测模型的轻量化部署提供了新的技术思路。

关键词: 目标检测, 无人机, 模型压缩, 轻量化, 模型量化, 小波变换

Abstract:

Current deep learning-based remote sensing object detection models rely on the powerful computing and storage resources of ground servers， and now capable of efficiently processing massive remote sensing data. However， in deployment scenarios for mobile edge devices such as Unmanned Aerial Vehicles （UAVs）， the limited computational resources， storage capacity， and energy constraints pose significant challenges for effectively deploying large-scale models. Model compression techniques， such as lightweight design and model quantization， have become critical forfacilitating the practical application of remote sensing object detection algorithms. This paper proposes a wavelet time-frequency localization-based model compression framework， using the single-stage object detection network as the baseline： By integrating depthwise separable convolution with the spatial-frequency localization characteristics of discrete wavelet transforms， a Wavelet Depthwise Separable Convolution （W-DSConv） module is constructed to achieve lightweight model reconstruction while expanding its receptive field； Leveraging wavelet frequency domain decomposition， a Wavelet Frequency-Division Quantization （W-FDQ） method for quantization-aware training is proposed， enabling independent quantization of features across different frequency bands to further compress the lightweight model. Experiments are conducted using YOLO-series models on the VisDrone2021 UAV remote sensing dataset.Results demonstrate that： The W-DSConv module reduces model parameters by 45.5% and computational load by 37.4%， while limiting detection accuracy fluctuations to within 2.2%； When applying 6-bit and 4-bit quantization via W-FDQ， the quantized models retain 95.8% and 92.9% of the floating-point model’s detection performance， respectively. This research provides novel technical insights for lightweight deployment of remote sensing object detection models on mobile platforms.

Key words: object detection, Unmanned Aerial Vehicles (UAVs), model compression, lightweight, model quantization, wavelet transforms

中图分类号:

V279

黄维, 潘家皓, 何楚. 小波时频局部化无人机目标检测模型压缩研究[J]. 航空学报, 2025, 46(23): 631952.

Wei HUANG, Jiahao PAN, Chu HE. Wavelet time-frequency localization-based model compression for UAV object detection[J]. Acta Aeronautica et Astronautica Sinica, 2025, 46(23): 631952.

图/表 16

图 1

图 2

表 1

卷积模块对比

模块	参数量	计算量
传统卷积	$2 K 2 C 2$	$2 K 2 C 2 H W$
深度可分离卷积	$K 2 C + 2 C 2$	$(K 2 C + 2 C 2) H W$
MV2Block	$6 K 2 C + 18 C 2$	$(6 K 2 C + 18 C 2) H W$
SV2Block	$K 2 C + 2 C 2$	$(K 2 C + 2 C 2) H W$
GhostConv	$K 2 C + C 2$	$(K 2 C + C 2) H W$
WaveGhostConv	$K 2 C + C 2$	$K 2 C + N C + C 2 + 163 1 - 1 4 M C H W$

表 1

图 3

图 4

图 5

图 6

表 2

表 3

表 4

表 5

表 6

表 7

图 7

图 8

表8

参考文献 33

[1]	江波，屈若锟，李彦冬，等. 基于深度学习的无人机航拍目标检测研究综述［J］. 航空学报， 2021， 42（4）： 524519.
	JIANG B， QU R K， LI Y D， et al. Object detection in UAV imagery based on deep learning： Review［J］. Acta Aeronautica et Astronautica Sinica， 2021， 42（4）： 524519 （in Chinese）.
[2]	欧阳权，张怡，马延，等. 基于深度学习的无人机航拍目标检测与跟踪方法综述［J］. 电光与控制， 2024， 31（3）： 1-7.
	OUYANG Q， ZHANG Y， MA Y， et al. A review of UAV aerial photography target detection and tracking methods based on deep learning［J］. Electronics Optics & Control， 2024， 31（3）： 1-7 （in Chinese）.
[3]	赵禄达，胡以华，赵楠翔，等. LiDAR点云深度学习模型的压缩和部署加速方法研究现状与展望（特邀）［J］. 激光与光电子学进展， 2024， 61（20）： 2011005.
	ZHAO L D， HU Y H， ZHAO N X， et al. Review of model compression and accelerated development for deep learning in LiDAR point cloud processing （Invited）［J］. Laser & Optoelectronics Progress， 2024， 61（20）： 2011005 （in Chinese）.
[4]	CHEN F H， LI S L， HAN J L， et al. Review of lightweight deep convolutional neural networks［J］. Archives of Computational Methods in Engineering， 2024， 31（4）： 1915-1937.
[5]	王军，冯孙铖，程勇. 深度学习的轻量化神经网络结构研究综述［J］. 计算机工程， 2021， 47（8）： 1-13.
	WANG J， FENG S C， CHENG Y. Survey of research on lightweight neural network structures for deep learning［J］. Computer Engineering， 2021， 47（8）： 1-13 （in Chinese）.
[6]	SIFRE L， MALLAT S. Rigid-motion scattering for texture classification［DB/OL］. arXiv preprint： 1403.1687， 2014.
[7]	HOWARD A G， ZHU M L， CHEN B， et al. Mobilenets： Efficient convolutional neural networks for mobile vision applications［DB/OL］. arXiv preprint： 1704.04861， 2017.
[8]	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： Inverted residuals and linear bottlenecks［C］∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 4510-4520.
[9]	ZHANG X Y， ZHOU X Y， LIN M X， et al. ShuffleNet： An extremely efficient convolutional neural network for mobile devices［C］∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 6848-6856.
[10]	HAN K， WANG Y H， TIAN Q， et al. GhostNet： More features from cheap operations［C］∥ 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2020： 1577-1586.
[11]	VASU P K A， GABRIEL J， ZHU J， et al. MobileOne：An improved one millisecond mobile backbone［C］∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2023： 7907-7917.
[12]	HAN S， MAO H Z， DALLY W J. Deep compression： Compressing deep neural networks with pruning， trained quantization and huffman coding［DB/OL］. arXiv preprint： 1510.00149， 2015.
[13]	LIU X C， YE M， ZHOU D Y， et al. Post-training quantization with multiple points： Mixed precision without mixed precision［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2021， 35（10）： 8697-8705.
[14]	NAGEL M， AMJAD R A， VAN BAALEN M， et al. Up or down？ Adaptive rounding for post-training quantization［C］∥Proceedings of the 37th International Conference on Machine Learning. New York： ACM， 2020： 7197-7206.
[15]	YUAN Z H， XUE C H， CHEN Y Q， et al. PTQ4ViT： Post-training quantization for vision transformers withtwin uniform quantization［C］∥Computer Vision-ECCV 2022. Cham： Springer， 2022： 191-207.
[16]	ESSER S K， MCKINSTRY J L， BABLANI D， et al. Learned step size quantization［DB/OL］. arXiv preprint：1902.08153， 2019.
[17]	BHALGAT Y， LEE J， NAGEL M， et al. LSQ+： Improving low-bit quantization through learnable offsets and better initialization［C］∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2020： 2978-2985.
[18]	CHOI J， WANG Z， VENKATARAMANI S， et al. Pact： Parameterized clipping activation for quantized neural networks［DB/OL］. arXiv preprint： 1805.06085， 2018.
[19]	LIU Z C， CHENG K T， HUANG D， et al. Nonuniform-to-uniform quantization： Towards accurate quantization via generalized straight-through estimation［C］∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2022： 4932-4942.
[20]	ZHU K， HE Y Y， WU J X. Quantized feature distillation for network quantization［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2023， 37（9）： 11452-11460.
[21]	MUSA A， KAKUDI H A， HASSAN M， et al. Lightweight deep learning models for edge devices： A survey［J］. International Journal of Computer Information Systems and Industrial Management Applications， 2025， 17： 18.
[22]	杨春，张睿尧，黄泷，等. 深度神经网络模型量化方法综述［J］. 工程科学学报， 2023， 45（10）： 1613-1629.
	YANG C， ZHANG R Y， HUANG L， et al. A survey of quantization methods for deep neural networks［J］. Chinese Journal of Engineering， 2023， 45（10）： 1613-1629 （in Chinese）.
[23]	NAGEL M， FOURNARAKIS M， AMJAD R A， et al. A white paper on neural network quantization［DB/OL］. arXiv preprint： 2106.08295， 2021.
[24]	ZHAO X Y， HUANG P， SHU X B. Wavelet-attention CNN for image classification［J］. Multimedia Systems， 2022， 28（3）： 915-924.
[25]	FINDER S E， AMOYAL R， TREISTER E， et al. Wavelet convolutions for Large receptive fields［C］∥Computer Vision-ECCV 2024. Cham： Springer， 2025： 363-380.
[26]	王晓柱，钮赛赛，张凯，等. 基于小波变换与特征提取的红外弱小目标图像融合［J］. 西北工业大学学报， 2020， 38（4）： 723-732.
	WANG X Z， NIU S S， ZHANG K， et al. Image fusion of infrared weak-small target based on wavelet transform and feature extraction［J］. Journal of Northwestern Polytechnical University， 2020， 38（4）： 723-732 （in Chinese）.
[27]	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： Unified， real-time object detection［C］∥2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2016： 779-788.
[28]	REDMON J， FARHADI A. Yolov3： An incremental improvement［DB/OL］. arXiv preprint： 1804.02767， 2018.
[29]	WANG C Y， BOCHKOVSKIY A， LIAO H M. YOLOv7： trainable bag-of-freebies sets new state-of-the-art for real-time object detectors［C］∥2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2023： 7464-7475.
[30]	FINDER S E， AMOYAL R， TREISTER E， et al. Wavelet convolutions for Large receptive fields［C］∥Computer Vision-ECCV 2024. Cham： Springer， 2025： 363-380.
[31]	PAN J H， HE C， HUANG W， et al. Wavelet tree transformer： Multihead attention with frequency-selective representation and interaction for remote sensing object detection［J］. IEEE Transactions on Geoscience and Remote Sensing， 2024， 62： 5637023.
[32]	GONG R H， LIU X L， JIANG S H， et al. Differentiable soft quantization： Bridging full-precision and low-bit neural networks［C］∥2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2019： 4851-4860.
[33]	HUANG L， DONG Z W， CHEN S L， et al. HQOD： Harmonious quantization for object detection［C］∥2024 IEEE International Conference on Multimedia and Expo （ICME）. Piscataway： IEEE Press， 2024： 1-6.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

模型	算法	Pre/%	Rec/%	mAP₅₀/%	mAP_50∶95/%	Params/M	GFLOPs
A	YOLOv8s	52.4	41.4	40.8	24.3	11.14	28.70
B	YOLOv8s+MV2Block	47.2	37.5	35.8	20.4	9.51	20.93
C	YOLOv8s+SV2Block	46.9	37.1	35.5	20.1	7.92	18.65
D	YOLOv8s+GhostConv	46.2	36.7	35.1	19.4	5.93	16.27
E	D+W-DWConv	48.4	38.8	37.6	22.0	5.93	16.30
F	D+W-PWConv	48.0	38.4	36.7	21.3	6.04	16.47
G	D+W-DSConv	50.8	38.8	38.4	22.4	6.04	16.49

N	输入分辨率：1 024×1 024				输入分辨率：640×640				Params/M	GFLOPs
N	Pre/%	Rec/%	mAP₅₀/%	mAP_50∶95/%	Pre/%	Rec/%	mAP₅₀/%	mAP_50∶95/%	Params/M	GFLOPs
	46.2	36.7	35.1	19.4	40.6	29.9	28.5	15.7	5.93	16.27
1	48.3	37.9	37.1	21.7	41.9	32.1	29.8	17.0	5.93	16.28
2	48.0	38.7	37.4	21.9	42.3	31.9	30.1	17.2	5.93	16.29
3	48.4	38.8	37.6	22.0	42.9	32.5	30.4	17.4	5.93	16.30
4	48.2	38.4	37.3	21.8	42.4	32.2	30.2	17.2	5.93	16.31
5	47.9	37.7	36.9	21.6	42.1	32.0	29.9	17.1	5.93	16.32
6	47.6	37.4	36.6	21.3	41.8	31.8	29.7	16.9	5.93	16.34

M	输入分辨率： 1 024×1 024				输入分辨率： 640×640				Params/M	GFLOPs
M	Pre/%	Rec/%	mAP₅₀/%	mAP_50∶95/%	Pre/%	Rec/%	mAP₅₀/%	mAP_50∶95/%	Params/M	GFLOPs
	46.2	36.7	35.1	19.4	40.2	31.2	28.5	15.4	5.93	16.27
1	47.0	37.6	36.2	20.8	43.1	31.7	29.8	16.7	6.04	16.45
2	48.0	38.4	36.7	21.3	43.6	32.0	30.3	17.2	6.04	16.49
3	47.5	38.1	36.4	21.1	43.2	31.4	30.1	17.1	6.04	16.49
4	47.9	37.8	36.6	21.2	41.7	31.9	30.0	16.9	6.04	16.49
5	47.6	37.9	36.3	21.0	41.4	31.7	29.7	16.5	6.04	16.49
6	47.3	37.4	36.1	20.6	41.0	31.3	29.2	16.1	6.04	16.50

原始模型	mAP₅₀/%	mAP_50∶95/%	Params/M	GFLOPs	轻量化模型	mAP₅₀/%	mAP_50∶95/%	Params/M	GFLOPs
YOLOv8n	36.1	21.1	3.01	8.20	WaveYOLOv8n	32.4	18.8	1.72	5.23
YOLOv8n-p2	38.1	22.4	2.93	12.38	WaveYOLOv8n-p2	35.6	20.7	1.61	8.86
YOLOv8n-p6	36.5	21.5	4.79	8.19	WaveYOLOv8n-p6	33.4	19.3	2.70	5.24
YOLOv8s	40.8	24.3	11.14	28.70	WaveYOLOv8s	38.4	22.4	6.04	16.49
YOLOv8s-p2	42.3	25.2	10.64	36.97	WaveYOLOv8s-p2	39.8	23.3	5.43	22.53
YOLOv8s-p6	41.0	24.4	17.88	28.58	WaveYOLOv8s-p6	39.5	23.0	9.58	16.48
YOLOv5s	39.7	23.5	9.13	24.06	WaveYOLOv5s	38.6	22.4	5.98	16.58
NanoDet	28.5	15.6	0.94	3.61	WaveNanoDet	29.7	16.3	0.95	3.97

模型	比特设置	mAP₅₀/%	mAP_50∶95/%	Params/M	Sizes/MB	GFLOPs
YOLOv8s	FP32	40.8	24.3	11.14	44.56	28.70
WaveYOLOv8s	FP32	38.4	22.4	6.04	24.16	16.49
	W6A6	36.8	21.7	6.05	4.53	3.09
	W4A4	35.9	20.6	6.05	3.02	2.06
	W4A3	34.3	19.4	6.05	3.02	1.80
YOLOv8s-p2	FP32	42.3	25.2	10.64	42.56	36.97
WaveYOLOv8s-p2	FP32	39.8	23.3	5.43	21.72	22.53
	W6A6	38.1	22.5	5.44	4.07	4.22
	W4A4	37.7	21.8	5.44	2.72	2.82
	W4A3	36.2	20.1	5.44	2.72	2.46
YOLOv8s-p6	FP32	41.0	24.4	17.88	71.52	28.58
WaveYOLOv8s-p6	FP32	39.5	23.0	9.58	38.32	16.48
	W6A6	37.6	21.8	9.59	7.19	3.09
	W4A4	36.4	21.0	9.59	4.79	2.06
	W4A3	34.3	19.5	9.59	4.79	1.80

小波时频局部化无人机目标检测模型压缩研究

Wavelet time-frequency localization-based model compression for UAV object detection

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 16

参考文献 33

相关文章 15

编辑推荐

Metrics

本文评价

算法	比特设置	mAP₅₀/%	mAP_50∶95/%
PACT	W4A4	33.3	19.1
DSQ		29.9	16.6
LSQ+		33.6	19.5
N2UQ		32.7	18.4
HDOQ		34.2	20.1
W-FDQ（本文）		35.9	20.6
PACT	W4A3	28.4	16.1
DSQ		24.8	13.6
LSQ+		26.2	14.7
N2UQ		27.3	15.8
HDOQ		29.5	16.7
W-FDQ（本文）		34.3	19.4

[1]	贺炅, 任斌武, 杜思亮, 徐尤松, 王博. 基于ADRC-RBF倾转四旋翼无人机姿态自适应控制[J]. 航空学报, 2025, 46(S1): 732189-732189.
[2]	周攀, 李霓, 黄江涛, 杨青林, 廉云霄. 非完备信息下无人机近距博弈自主决策[J]. 航空学报, 2025, 46(S1): 732215-732215.
[3]	虞翔宇, 李文, 严杰, 梁世哲. 无人机液氢燃料电池热管理系统仿真研究[J]. 航空学报, 2025, 46(9): 630964-630964.
[4]	杨芃芊, 陈禹彤, 刘俊辉, 杨杰豪, 单家元, 孙士珺. 串列翼货运无人机大攻角气动与操稳特性[J]. 航空学报, 2025, 46(9): 131056-131056.
[5]	李荣祖, 刘莉, 杨盾. 基于多源域融合代理模型的氢能无人机优化设计[J]. 航空学报, 2025, 46(9): 630979-630979.
[6]	万开方, 吴志林, 武韫晖, 强皓植, 吴艺博, 李波. 拒止环境下基于深度强化学习的多无人机协同定位[J]. 航空学报, 2025, 46(8): 331024-331024.
[7]	姜凌峰, 李新凯, 张海, 李涵玮, 张宏立. 基于改进TD3算法的无人机动态环境无地图导航[J]. 航空学报, 2025, 46(8): 331035-331035.
[8]	向锦武, 马凯, 阚梓, 李道春, 郑可欣, 陈汉轩. 氢能源无人机关键技术研究进展[J]. 航空学报, 2025, 46(5): 531603-531603.
[9]	丁奇帅, 雷帮军, 吴正平. 基于孪生网络的轻量型无人机单目标跟踪算法[J]. 航空学报, 2025, 46(4): 330925-330925.
[10]	吴付杰, 王博文, 齐静雅, 曹铭智, 桑英俊, 李晟, 张玉珍, 陈钱, 左超. 机载多孔径全景图像合成技术研究进展[J]. 航空学报, 2025, 46(3): 630505-630505.
[11]	马诺, 卫社春, 孟军辉, 刘清洋, 雷宇声. 考虑减速伞作用的无人机内埋舱体分离流场特性与动力学[J]. 航空学报, 2025, 46(3): 130755-130755.
[12]	吴一全, 童康. 基于深度学习的无人机航拍图像小目标检测研究进展[J]. 航空学报, 2025, 46(3): 30848-030848.
[13]	张安平, 董昊. 应对高端战争的无人机蜂群及其起飞方式[J]. 航空学报, 2025, 46(22): 331034-331034.
[14]	宋亚航, 张鑫, 马志明, 左峥瑜. 翼型阵风减缓等离子体流动控制低速风洞试验[J]. 航空学报, 2025, 46(22): 131975-131975.
[15]	宋怡成, 齐瑞云, 姜斌. 通信故障下无人机编队网络分布式拓扑重构[J]. 航空学报, 2025, 46(22): 331914-331914.