ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Real⁃time dense small object detection algorithm for UAV based on improved YOLOv5
Received date: 2022-03-04
Revised date: 2022-03-22
Accepted date: 2022-04-28
Online published: 2022-05-11
Supported by
National Natural Science Foundation of China(U20A20121);Zhejiang Natural Fund Project(LY21F020006);Ningbo Natural Science Foundation Project(2019A610088);Ningbo Key Science and Technology Plan (2025) Project(2019B10125)
UAV aerial images have more complex backgrounds and a large number of dense small targets compared with natural scene images, which impose higher requirements on the detection network. On the premise of ensuring real-time object detection, a YOLOv5-based UAV real-time dense small object detection algorithm is proposed for the problem of low accuracy of dense small object detection in UAV view. First, combining Spatial Attention Module (SAM) with Channel Attention Module (CAM), the fully connected layer after feature compression in CAM is improved to reduce the computational effort. In addition, the connection structure of CAM and SAM is changed to improve the spatial dimensional feature capture capability. In summary, a Spatial-Channel Attention Module (SCAM) is proposed to improve the model's attention to the aggregated regions of small targets in the feature map; secondly, an SCAM- based Attentional Feature Fusion module (SC-AFF) is proposed to enhance the feature fusion efficiency of small targets by adaptively assigning attentional weights according to feature maps of different scales; finally, a backbone network is introduced in the Transformer in the backbone network, and use the SC-AFF to improve the feature fusion at the original residual connections to better capture global information and rich contextual information, and improve the feature extraction capability of dense small targets in complex backgrounds. Experiments are conducted on the VisDrone2021 dataset. The effects of different network scale parameters and different input resolutions on the detection accuracy and speed of YOLOv5 are first investigated. The analysis concludes that YOLOv5s is more suitable to be used as a benchmark model for UAV real-time object detection. Under the benchmark of YOLOv5s, the improved model improves mAP50 by 6.4% and mAP75 by 5.8%, and the FPS for high-resolution images can reach 46. The mAP50 of the model trained at an input resolution of 1504×1504 can reach 54.5%, which is 11.5% better than that of YOLOv4. The accuracy is improved while the detection speed FPS remains at 46, which is more suitable for real-time UAV object detection in dense small target scenarios.
Zhiqiang FENG , Zhijun XIE , Zhengwei BAO , Kewei CHEN . Real⁃time dense small object detection algorithm for UAV based on improved YOLOv5[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023 , 44(7) : 327106 -327106 . DOI: 10.7527/S1000-6893.2022.27106
1 | 江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4): 524519. |
JIANG B, QU R K, LI Y D, et al. Object detection in UAV imagery based on deep learning: Review[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524519 (in Chinese). | |
2 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. |
3 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788. |
4 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]∥European Conference on Computer Vision (ECCV). Amsterdam: Springer, 2016: 21-37. |
5 | REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6517-6525. |
6 | REDMON J, FARHADI A. YOLOv3: An incremental improvement[DB/OL]. arXiv preprint: 1804.02767, 2018. |
7 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[DB/OL]. arXiv preprint: 2004.10934, 2020. |
8 | 李科岑, 王晓强, 林浩, 等. 深度学习中的单阶段小目标检测方法综述[J]. 计算机科学与探索, 2022, 16(1): 41-58. |
LI K C, WANG X Q, LIN H, et al. Survey of one-stage small object detection methods in deep learning[J]. Journal of Frontiers of Computer Science and Technology, 2022, 16(1): 41-58 (in Chinese). | |
9 | WANG Q C, ZHANG H, HONG X G, et al. Small object detection based on modified FSSD and model compression[J]. 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP), 2021: 88-92. |
10 | GONG Y Q, YU X H, DING Y, et al. Effective fusion factor in FPN for tiny object detection[C]∥2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 1159-1167. |
11 | LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944. |
12 | 刘芳, 韩笑. 基于多尺度深度学习的自适应航拍目标检测[J]. 航空学报, 2022, 43(5): 325270. |
LIU F, HAN X. Adaptive aerial object detection based on multi-scale deep learning[J]. Acta Aeronautica et Astronautica Sinica, 2022, 43(5): 325270 (in Chinese). | |
13 | WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]∥Computer Vision – ECCV 2018, 2018. |
14 | WANG Q L, WU B G, ZHU P F, et al. ECA-net: Efficient channel attention for deep convolutional neural networks[C]∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 11531-11539. |
15 | LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8759-8768. |
16 | DAI Y M, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]∥2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 3559-3568. |
17 | ZHU L L, GENG X, LI Z, et al. Improving YOLOv5 with attention mechanism for detecting boulders from planetary images[J]. Remote Sensing, 2021, 13(18): 3776. |
18 | ZHU X K, LYU S C, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]∥2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Piscataway: IEEE Press, 2021: 2778-2788. |
19 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]∥ International Conference on Learning Representations (ICLR), 2021. |
20 | PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 805-815. |
21 | VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[DB/OL]. arXiv preprint: 1706.03762, 2017. |
22 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]∥2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2999-3007. |
23 | ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4203-4212. |
24 | CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162. |
25 | LI Z M, PENG C, YU G, et al. Light-head R-CNN: In defense of two-stage object detector[DB/OL]. arXiv preprint: 1711. 07264, 2017. |
26 | LAW H, DENG J. CornerNet: Detecting objects as paired keypoints[J]. International Journal of Computer Vision, 2020, 128(3): 642-656. |
27 | HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. |
/
〈 |
|
〉 |