To address the challenges of mutual occlusion, tiny pixels, and complex backgrounds in low-altitude UAV-based object detection, this paper proposes HPRS-YOLO, a small target detection algorithm optimized for UAV platforms. The backbone network incorporates a novel Multi-Scale Spatial Pyramid (SPMCC), which replaces max-pooling-based downsampling with dilated convolution to dynamically adjust the receptive field, thereby enhancing contex-tual feature extraction; The improved C3K2 module integrates two Metaformer architectures to reinforce structural and textural features of small targets while reducing parameters and maintaining low computational overhead; A dynamic upsampling operator (Dysample) is introduced to suppress offset overlaps and boundary pixel value con-fusion, thereby improving target-background contrast; The neck network is redesigned with a Shallow Detail Focus Module (SDFM) to achieve cross-scale feature calibration between terminal layers, emphasizing low-level feature maps to compensate for missing small-target characteristics and preserve spatial integrity of occluded objects. On the dataset VisDrone2019, ablation and comparison experiments are conducted. The results show that mAP0.5 and mAP0.5:0.95 are improved by 5 and 3 percentage points, respectively, when compared to the baseline method. Gener-alization experiments are conducted on the public datasets DOTA, and mAP0.5 is improved by 2.0%, demonstrating good robustness, and finally the model is deployed to the embedded device NVIDIA Jetson AGX Orin for validation, and the FPS is up to 60, demonstrating that HPRS-YOLO guarantees real-time detection capability by optimizing the algorithm design while keeping high accuracy.
[1] 王强,吴乐天,王勇,等.基于关键点检测的红外弱小目标检测[J].航空学报, 2023, 44(10): 289-299.
WANG Q, WU L T, WANG Y, et al. An infrared small target detection method based on key point[J]. Acta Aer-onautica et Astronautica Sinica, 2023, 44(10): 289- 299. (in Chinese).
[2] SHIN G, YOOUN H, SHIN D, et al. Incremental learn-ing method for cyber intelligence, surveillance, and re-connaissance in closed military network using converged IT techniques[J]. Soft Computing, 2018, 22(20): 6835-6844.
[3] Ang L ,Shijie S ,Zhaoyang Z , et al.A Multi-Scale Traffic Object Detection Algorithm for Road Scenes Based on Improved YOLOv5[J].Electronics,2023,12(4):878-878.
[4] Lai H, Chen L, Liu W, et al. STC-YOLO: small object detection network for traffic signs in complex environ-ments[J]. Sensors, 2023, 23(11): 5307.
[5] Bhadra S, Sagan V, Sarkar S, et al. PROSAIL-Net: A transfer learning-based dual stream neural network to es-timate leaf chlorophyll and leaf angle of crops from UAV hyperspectral images[J]. ISPRS Journal of Photogram-metry and Remote Sensing, 2024, 210: 1-24.
[6] Martinez-Alpiste I, Golcarenarenji G, Wang Q, et al. Search and rescue operation using UAVs: A case study[J]. Expert Systems with Applications, 2021, 178: 114937.
[7] Duo C, Li Y, Gong W, et al. UAV‐aided distribution line inspection using double‐layer offloading mecha-nism[J]. IET Generation, Transmission & Distribution, 2024.
[8] Dai J, Li Y, He K, et al. R-fcn: Object detection via re-gion-based fully convolutional networks[J]. Advances in neural information processing systems, 2016, 29.
[9] Girshick R. Fast r-cnn[J]. arxiv preprint arxiv:1504.08083, 2015.
[10] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580-587.
[11] Cai Z, Vasconcelos N. Cascade r-cnn: Delving into high quality object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6154-6162.
[12] Redmon J. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
[13] Jocher G, Stoken A, Borovec J, et al. ultralytics/yolov5: v3. 1-bug fixes and performance improvements[J]. Ze-nodo, 2020.
[14] Varghese R, Sambath M. YOLOv8: A Novel Ob-ject Detection Algorithm with Enhanced Performance and Robustness[C]//2024 International Conference on Advances in Data Engineering and Intelligent Compu-ting Systems (ADICS). IEEE, 2024: 1-6.
[15] Agrawal N. Design Tradeoffs for SSD Perfor-mance[C]//USENIX ATC. 2008.
[16] 冒国韬, 邓天民, 于楠晶. 基于多尺度分割注意力的无人机航拍图像目标检测算法[J]. 航空学报, 2023, 44(5): 273-283.
MAO G T, DENG T M, YU N J. Object detection in UAV images based on multi-scale split attention[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(5): 273-283. (in Chinese).
[17] 罗旭东, 吴一全, 陈金林. 无人机航拍影像目标检测 与语义分割的深度学习方法研究进展[J]. 航空学报, 2024, 45(6): 1-30.
LUO X D, WU Y Q, CHEN J L. Research progress on deep learning methods for object detection and semantic segmentation in UAV aerial images[J]. Acta Aeronautica et Astronautica Sinica, 2024, 45(6): 1-30. (in Chinese).
[18] Chen P, Wang J, Zhang Z, et al. CSPGNet: Cross-scale spatial perception guided network for tiny object detec-tion in remote sensing images[J]. Digital Signal Pro-cessing, 2024, 154: 104674.
[19] Luo X, Wu Y, Zhao L. YOLOD: A target detection method for UAV aerial imagery[J]. Remote Sensing, 2022, 14(14): 3240.
[20] Xue C, Xia Y, Wu M, et al. EL-YOLO: An efficient and lightweight low-altitude aerial objects detector for onboard applications[J]. Expert Systems with Applica-tions, 2024, 256: 124848.
[21] Zhang H, Sun W, Sun C, et al. HSP-YOLOv8: UAV Aerial Photography Small Target Detection Algo-rithm[J]. Drones, 2024, 8(9): 453.
[22] Xiao X, Xue X, Zhao Z, et al. A Recursive Prediction-Based Feature Enhancement for Small Object Detec-tion[J]. Sensors, 2024, 24(12): 3856.
[23] Zhao L L, Zhu M L. MS-YOLOv7: YOLOv7 based on multi-scale for object detection on UAV aerial photog-raphy[J]. Drones, 2023, 7(3): 188.
[24] Wang L, Tien A. Aerial image object detection with vi-sion transformer detector (ViTDet)[C]//IGARSS 2023-2023 IEEE International Geoscience and Remote Sens-ing Symposium. IEEE, 2023: 6450-6453.
[25] Vaswani A. Attention is all you need[J]. Advances in Neural Information Processing Systems, 2017.
[26] Yu W, Si C, Zhou P, et al. Metaformer baselines for vi-sion[J]. IEEE Transactions on Pattern Analysis and Ma-chine Intelligence, 2023.
[27] Liu W, Lu H, Fu H, et al. Learning to upsample by learn-ing to sample[C]//Proceedings of the IEEE/CVF Interna-tional Conference on Computer Vision. 2023: 6027-6037.
[28] Tang L, Zhang H, Xu H, et al. Rethinking the necessity of image fusion in high-level vision tasks: A practical in-frared and visible image fusion network based on pro-gressive semantic injection and scene fidelity[J]. Infor-mation Fusion, 2023, 99: 101870.
[29] Yu F. Multi-scale context aggregation by dilated convo-lutions[J]. arxiv preprint arxiv:1511.07122, 2015.
[30] Yu W, Luo M, Zhou P, et al. Metaformer is actually what you need for vision[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 10819-10829.
[31] Chollet F. Xception: Deep learning with depthwise sepa-rable convolutions[C]//Proceedings of the IEEE confer-ence on computer vision and pattern recognition. 2017: 1251-1258.
[32] Mamalet F, Garcia C. Simplifying convnets for fast learning[C]//International Conference on Artificial Neu-ral Networks. Berlin, Heidelberg: Springer Berlin Hei-delberg, 2012: 58-65.
[33] Du D, Zhu P, Wen L, et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results[C]//Proceedings of the IEEE/CVF international conference on computer vision workshops. 2019: 0-0.
[34] Zihan L, xu W, Linyun Z, et al. LightYOLO-S: a light-weight algorithm for detecting small targets[J]. Journal of Real-Time Image Processing, 2024, 21(4): 111.