小波时频局部化无人机目标检测模型压缩研究

  • 黄维 ,
  • 潘家皓 ,
  • 何楚
展开
  • 1. 湖北省武汉市武汉大学电子信息学院
    2. 武汉大学

收稿日期: 2025-03-10

  修回日期: 2025-05-06

  网络出版日期: 2025-05-08

基金资助

国家重点研发计划项目;国家自然科学基金项目

Wavelet Time-Frequency Localization-Based Model Compression for UAV Object Detection

  • HUANG Wei ,
  • PAN Jia-Hao ,
  • HE Chu
Expand
  • 1.
    2. Wuhan University

Received date: 2025-03-10

  Revised date: 2025-05-06

  Online published: 2025-05-08

摘要

当前基于深度学习的遥感目标检测模型依托地面服务器强大的算力与存储资源,已具备高效处理海量遥感数据的能力。然而,在无人机等移动边缘设备部署场景中,受限于计算资源、存储容量与能耗约束,大规模模型难以实现有效部署。模型压缩,如轻量化设计、模型量化等,已成为了推动遥感目标检测算法落地的重要技术。本文以单阶段的目标检测深度网络为基准,提出一种基于小波时频局部化的模型压缩框架:(1)通过深度可分离卷积与离散小波变换的空频局部化特性相融合,构建具有扩展感受野的小波深度可分离卷积(W-DSConv)模块,实现了模型的轻量化重构;(2)基于小波频域分解特性,提出小波分频量化(W-FDQ)的量化感知训练方法,实现不同频段特征的独立量化,进一步完成对轻量化模型的压缩。实验中我们选取了YOLO系列的网络模型进行验证,在VisDrone2021无人机遥感数据集上的实验表明:(1)所提W-DSConv模块在参数量减少45.5%,计算量减少37.4%的情况下,检测精度波动幅度控制在2.2%以内;(2)采用W-FDQ方法进行6比特与4比特量化时,量化模型分别保持了浮点模型的95.8%与92.9%的检测性能。本研究为移动端遥感目标检测模型的轻量化部署提供了新的技术思路。

本文引用格式

黄维 , 潘家皓 , 何楚 . 小波时频局部化无人机目标检测模型压缩研究[J]. 航空学报, 0 : 1 -0 . DOI: 10.7527/S1000-6893.2025.31952

Abstract

Current deep learning-based remote sensing object detection models rely on the powerful computing and storage re-sources of ground servers, enabling efficient processing of massive remote sensing data. However, in deployment scenarios for mobile edge devices such as Unmanned Aerial Vehicles (UAVs), the limited computational resources, storage capacity, and energy constraints make it challenging to effectively deploy large-scale models. Model compression techniques, such as lightweight design and model quantization, have become critical for enabling the practical application of remote sensing object detection algorithms. This paper proposes a wavelet time-frequency localization-based model compression framework, using the single-stage object de-tection network as the baseline: (1) By integrating depthwise separable convolution with the spatial-frequency localization charac-teristics of discrete wavelet transform, a Wavelet Depthwise Separable Convolution (W-DSConv) module is constructed to achieve lightweight model reconstruction while expanding its receptive field; (2) Leveraging wavelet frequency domain decomposition, a Wavelet Frequency-Division Quantization (W-FDQ) method for quantization-aware training is proposed, enabling independent quantization of features across different frequency bands to further compress the lightweight model. Experiments are conducted using YOLO-series models on the VisDrone2021 UAV remote sensing dataset. Results demonstrate that: (1) The W-DSConv module reduces model parameters by 45.5% and computational load by 37.4%, while limiting detection accuracy fluctuations to within 2.2%; (2) When applying 6-bit and 4-bit quantization via W-FDQ, the quantized models retain 95.8% and 92.9% of the floating-point model’s detection performance, respectively. This research provides novel technical insights for lightweight deployment of remote sensing object detection models on mobile platforms.

参考文献

[1]江波, 屈若锟, 李彦冬, 等.基于深度学习的无人机航拍目标检测研究综述[J].航空学报, 2021, 42(4):524519-524519
[2]JIANG B, QU R K, LI Y D, et al.Object detection in UAV imagery based on deep learning: Review[J].ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021, 42(4):524519-524519
[3]欧阳权, 张怡, 马延, 等.基于深度学习的无人机航拍目标检测与跟踪方法综述[J].电光与控制, 2024, 31(3):1-7
[4]OUYANG Q, ZHANG Y, MA Y, et al.A Review of UAV Aerial Photography Target Detection and Tracking Methods Based on Deep Learning[J].Electronics Optics & Control, 2024, 31(3):1-7
[5]赵禄达, 胡以华, 赵楠翔, 等.点云深度学习模型的压缩和部署加速方法研究现状与展望特邀[J].激光与光电子学进展, 2024, 61(20):2011005-2011005
[6]ZHAO L D, HU Y H, ZHAO N X, et al.Review of Model Compression and Accelerated Development for Deep Learning in LiDAR Point Cloud Processing (Invited)[J].Laser & Optoelectronics Progress, 2024, 61(20):2011005-2011005
[7]CHEN F H, LI S L, HAN J L, et al.Review of light-weight deep convolutional neural networks[J].Archives of Computational Methods in Engineering, 2024, 31(4):1915-1937
[8]王军, 冯孙铖, 程勇.深度学习的轻量化神经网络结构研究综述[J].计算机工程, 2021, 47(8):1-13
[9]WANG J, FENG S C, CHENG Y.Survey of Research on Lightweight Neural Network Structures for Deep Learning[J].Computer Engineering, 2021, 47(8):1-13
[10]SIFRE L, MALLAT S. Rigid-motion scattering for tex-ture classification[J]. arXiv prep.[J].rXiv:1403.1687, 2014., rint, :-
[11]HOWARD A G, ZHU M L, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arXiv prep.[J].rXiv:1704.04861, 2017., rint, :-
[12]SANDLER M, HOWARD A, ZHU M L, et al.Mo-bilenetv2: Inverted residuals and linear bottle-necks[C]//Proceedings of the IEEE conference on com-puter vision and pattern recognition. 2018: 4510-4520.
[13]ZHANG X Y, ZHOU X Y, LIN M X, et al.Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856.
[14]HAN K, WANG Y H, TIAN Q, et al.Ghostnet: More features from cheap operations[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1580-1589.
[15]VASU P K A, GABRIEL J, ZHU J, et al.Mobileone: An improved one millisecond mobile back-bone[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023: 7907-7917.
[16]HAN S, MAO H Z, DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[J]. arXiv prep.[J].rXiv:1510.00149, 2015., rint, :-
[17]LIU X C, YE M, ZHOU D Y, et al.Post-training quanti-zation with multiple points: Mixed precision without mixed precision[C]//Proceedings of the AAAI confer-ence on artificial intelligence. 2021, 35(10): 8697-8705.
[18]NAGEL M, AMJAD R A, VAN BAALEN M, et al.Up or down? adaptive rounding for post-training quantiza-tion[C]//International conference on machine learning. PMLR, 2020: 7197-7206.
[19]YUAN Z H, XUE C H, CHEN Y Q, et al.Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization[C]//European conference on com-puter vision. Cham: Springer Nature Switzerland, 2022: 191-207.
[20]ESSER S K, MCKINSTRY J L, BABLANI D, et al. Learned step size quantization[J]. arXiv prep.[J].rXiv:1902.08153, 2019., rint, :-
[21]BHALGAT Y, LEE J, NAGEL M, et al.Lsq+: Improv-ing low-bit quantization through learnable offsets and better initialization[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020: 696-697.
[22]CHOI J, WANG Z, VENKATARAMANI S, et al. Pact: Parameterized clipping activation for quantized neural networks[J]. arXiv prep.[J].rXiv:1805.06085, 2018., rint, :-
[23]LIU Z C, CHENG K T, HUANG D, et al.Nonuniform-to-uniform quantization: Towards accurate quantization via generalized straight-through estima-tion[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 4942-4952.
[24]ZHU K, HE Y Y, WU J X.Quantized feature distillation for network quantization[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2023, 37(9): 11452-11460.
[25]MUSA A, KAKUDI H A, HASSAN M, et al.Light-weight Deep Learning Models For Edge Devices—A Survey[J].International Journal of Computer Infor-mation Systems and Industrial Management Applications, 2025, 17:18-
[26]杨春, 张睿尧, 黄泷, 等.深度神经网络模型量化方法综述[J].工程科学学报, 2023, 45(10):1613-1629
[27]YANG C, ZHANG R Y, HUANG L, et al.A survey of quantization methods for deep neural net-works[J].Chinese Journal of Engineering, 2023, 45(10):1613-1629
[28]NAGEL M, FOURNARAKIS M, AMJAD R A, et al. A white paper on neural network quantization[J]. arXiv prep.[J].rXiv:2106.08295, 2021., rint, :-
[29]REDMON J, DIVVALA S, GIRSHICK R, et al.You only look once: Unified, real-time object detection[C] //2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
[30]REDMON J, FARHADI A. Yolov3: An incremental improvement[J]. arXiv prep.[J].rXiv:1804.02767, 2018., rint, :-
[31]WANG C Y, BOCHKOVSKIY A, LIAO H Y M.YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pat-tern recognition. 2023: 7464-7475.
[32]FINDER S E, AMOYAL R, TREISTER E, et al.Wavelet convolutions for large receptive fields[C]//European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 363-380.
[33]PAN J H, HE C, HUANG W, et al.Wavelet Tree Trans-former: Multi-Head Attention with Frequency Selective Representation and Interaction for Remote Sensing Object Detection[J].IEEE Transactions on Geoscience and Remote Sensing, 2024, 62:1-23
[34]GONG R H, LIU X L, JIANG S H, et al.Differentiable soft quantization: Bridging full-precision and low-bit neural networks[C]//Proceedings of the IEEE/CVF inter-national conference on computer vision. 2019: 4852-4861.
[35]HUANG L, DONG Z W, CHEN S L, et al.HQOD: Harmonious Quantization for Object Detection[C]//2024 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2024: 1-6.
文章导航

/