电子电气工程与控制

基于改进YOLOv5的无人机实时密集小目标检测算法

  • 奉志强 ,
  • 谢志军 ,
  • 包正伟 ,
  • 陈科伟
展开
  • 1.宁波大学 信息科学与工程学院,宁波  315211
    2.宁波极望信息科技有限公司,宁波  315000
    3.宁波大学 机械工程与力学学院,宁波  315211
.E-mail: xiezhijun@nbu.edu.cn

收稿日期: 2022-03-04

  修回日期: 2022-03-22

  录用日期: 2022-04-28

  网络出版日期: 2022-05-11

基金资助

国家自然科学基金(U20A20121);浙江省自然基金(LY21F020006);宁波市自然科学基金(2019A610088);宁波市“科技创新 2025”重大专项(2019B10125)

Real⁃time dense small object detection algorithm for UAV based on improved YOLOv5

  • Zhiqiang FENG ,
  • Zhijun XIE ,
  • Zhengwei BAO ,
  • Kewei CHEN
Expand
  • 1.School of Information Science and Engineering,Ningbo University,Ningbo  315211,China
    2.Ningbo JIWANG Information Technology Ltd,Ningbo  315000,China
    3.School of Mechanical Engineering and Mechanics,Ningbo University,Ningbo  315211,China

Received date: 2022-03-04

  Revised date: 2022-03-22

  Accepted date: 2022-04-28

  Online published: 2022-05-11

Supported by

National Natural Science Foundation of China(U20A20121);Zhejiang Natural Fund Project(LY21F020006);Ningbo Natural Science Foundation Project(2019A610088);Ningbo Key Science and Technology Plan (2025) Project(2019B10125)

摘要

无人机航拍图像与自然场景图像相比背景更复杂,存在大量密集小目标,对检测网络提出了更高的要求。在保证目标检测实时性的前提下,针对无人机视角下密集小目标检测精度低的问题,提出一种基于YOLOv5的无人机实时密集小目标检测算法。首先,将空间注意力(SAM)与通道注意力(CAM)相结合,改进CAM中特征压缩后的全连接层,降低计算量。另外,改变CAM与SAM的连接结构,提高空间维度特征捕获能力。综上,提出一种空间-通道注意力模块(SCAM),提高模型对特征图中小目标聚集区域的关注程度;其次,提出一种基于SCAM的注意力特征融合模块(SC-AFF),根据不同尺度特征图自适应分配注意力权重,增强小目标的特征融合效率;最后,在主干网络中引入Transformer模块,并利用SC-AFF模块改进原有的残差连接处的特征融合方式,更好地捕获全局信息和丰富的上下文信息提高复杂背景下密集小目标的特征提取能力。在VisDrone2021数据集上进行实验,YOLOv5s基准下,改进后模型的mAP50提高了6.4%,mAP75提高了5.8%,对高分辨率图像的FPS可达到46。在输入分辨率1 504×1 504下训练的模型mAP50可达54.5%,比YOLOv4提高了11.5%,精度提高的同时检测速度FPS依旧保持在46,更适用于密集小目标场景下的无人机实时目标检测。

本文引用格式

奉志强 , 谢志军 , 包正伟 , 陈科伟 . 基于改进YOLOv5的无人机实时密集小目标检测算法[J]. 航空学报, 2023 , 44(7) : 327106 -327106 . DOI: 10.7527/S1000-6893.2022.27106

Abstract

UAV aerial images have more complex backgrounds and a large number of dense small targets compared with natural scene images, which impose higher requirements on the detection network. On the premise of ensuring real-time object detection, a YOLOv5-based UAV real-time dense small object detection algorithm is proposed for the problem of low accuracy of dense small object detection in UAV view. First, combining Spatial Attention Module (SAM) with Channel Attention Module (CAM), the fully connected layer after feature compression in CAM is improved to reduce the computational effort. In addition, the connection structure of CAM and SAM is changed to improve the spatial dimensional feature capture capability. In summary, a Spatial-Channel Attention Module (SCAM) is proposed to improve the model's attention to the aggregated regions of small targets in the feature map; secondly, an SCAM- based Attentional Feature Fusion module (SC-AFF) is proposed to enhance the feature fusion efficiency of small targets by adaptively assigning attentional weights according to feature maps of different scales; finally, a backbone network is introduced in the Transformer in the backbone network, and use the SC-AFF to improve the feature fusion at the original residual connections to better capture global information and rich contextual information, and improve the feature extraction capability of dense small targets in complex backgrounds. Experiments are conducted on the VisDrone2021 dataset. The effects of different network scale parameters and different input resolutions on the detection accuracy and speed of YOLOv5 are first investigated. The analysis concludes that YOLOv5s is more suitable to be used as a benchmark model for UAV real-time object detection. Under the benchmark of YOLOv5s, the improved model improves mAP50 by 6.4% and mAP75 by 5.8%, and the FPS for high-resolution images can reach 46. The mAP50 of the model trained at an input resolution of 1504×1504 can reach 54.5%, which is 11.5% better than that of YOLOv4. The accuracy is improved while the detection speed FPS remains at 46, which is more suitable for real-time UAV object detection in dense small target scenarios.

参考文献

1 江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报202142(4): 524519.
  JIANG B, QU R K, LI Y D, et al. Object detection in UAV imagery based on deep learning: Review[J]. Acta Aeronautica et Astronautica Sinica202142(4): 524519 (in Chinese).
2 REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence201739(6): 1137-1149.
3 REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2016: 779-788.
4 LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]∥European Conference on Computer Vision (ECCV). Amsterdam: Springer, 2016: 21-37.
5 REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 6517-6525.
6 REDMON J, FARHADI A. YOLOv3: An incremental improvement[DB/OL]. arXiv preprint1804.02767, 2018.
7 BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[DB/OL]. arXiv preprint2004.10934, 2020.
8 李科岑, 王晓强, 林浩, 等. 深度学习中的单阶段小目标检测方法综述[J]. 计算机科学与探索202216(1): 41-58.
  LI K C, WANG X Q, LIN H, et al. Survey of one-stage small object detection methods in deep learning[J]. Journal of Frontiers of Computer Science and Technology202216(1): 41-58 (in Chinese).
9 WANG Q C, ZHANG H, HONG X G, et al. Small object detection based on modified FSSD and model compression[J]. 2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP)2021: 88-92.
10 GONG Y Q, YU X H, DING Y, et al. Effective fusion factor in FPN for tiny object detection[C]∥2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 1159-1167.
11 LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2017: 936-944.
12 刘芳, 韩笑. 基于多尺度深度学习的自适应航拍目标检测[J]. 航空学报202243(5): 325270.
  LIU F, HAN X. Adaptive aerial object detection based on multi-scale deep learning[J]. Acta Aeronautica et Astronautica Sinica202243(5): 325270 (in Chinese).
13 WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]∥Computer Vision – ECCV 2018, 2018.
14 WANG Q L, WU B G, ZHU P F, et al. ECA-net: Efficient channel attention for deep convolutional neural networks[C]∥2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2020: 11531-11539.
15 LIU S, QI L, QIN H F, et al. Path aggregation network for instance segmentation[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 8759-8768.
16 DAI Y M, GIESEKE F, OEHMCKE S, et al. Attentional feature fusion[C]∥2021 IEEE Winter Conference on Applications of Computer Vision. Piscataway: IEEE Press, 2021: 3559-3568.
17 ZHU L L, GENG X, LI Z, et al. Improving YOLOv5 with attention mechanism for detecting boulders from planetary images[J]. Remote Sensing202113(18): 3776.
18 ZHU X K, LYU S C, WANG X, et al. TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios[C]∥2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Piscataway: IEEE Press, 2021: 2778-2788.
19 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[C]∥ International Conference on Learning Representations (ICLR), 2021.
20 PAN X R, GE C J, LU R, et al. On the integration of self-attention and convolution[C]∥2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2022: 805-815.
21 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all You need[DB/OL]. arXiv preprint: 1706.03762, 2017.
22 LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]∥2017 IEEE International Conference on Computer Vision. Piscataway: IEEE Press, 2017: 2999-3007.
23 ZHANG S F, WEN L Y, BIAN X, et al. Single-shot refinement neural network for object detection[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4203-4212.
24 CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162.
25 LI Z M, PENG C, YU G, et al. Light-head R-CNN: In defense of two-stage object detector[DB/OL]. arXiv preprint: 1711. 07264, 2017.
26 LAW H, DENG J. CornerNet: Detecting objects as paired keypoints[J]. International Journal of Computer Vision2020128(3): 642-656.
27 HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence201537(9): 1904-1916.
文章导航

/