导航

ACTA AERONAUTICAET ASTRONAUTICA SINICA ›› 2023, Vol. 44 ›› Issue (7): 327106-327106.doi: 10.7527/S1000-6893.2022.27106

• Electronics and Electrical Engineering and Control • Previous Articles     Next Articles

Real⁃time dense small object detection algorithm for UAV based on improved YOLOv5

Zhiqiang FENG1, Zhijun XIE1(), Zhengwei BAO2, Kewei CHEN3   

  1. 1.School of Information Science and Engineering,Ningbo University,Ningbo  315211,China
    2.Ningbo JIWANG Information Technology Ltd,Ningbo  315000,China
    3.School of Mechanical Engineering and Mechanics,Ningbo University,Ningbo  315211,China
  • Received:2022-03-04 Revised:2022-03-22 Accepted:2022-04-28 Online:2023-04-15 Published:2022-05-11
  • Contact: Zhijun XIE E-mail:xiezhijun@nbu.edu.cn
  • Supported by:
    National Natural Science Foundation of China(U20A20121);Zhejiang Natural Fund Project(LY21F020006);Ningbo Natural Science Foundation Project(2019A610088);Ningbo Key Science and Technology Plan (2025) Project(2019B10125)

Abstract:

UAV aerial images have more complex backgrounds and a large number of dense small targets compared with natural scene images, which impose higher requirements on the detection network. On the premise of ensuring real-time object detection, a YOLOv5-based UAV real-time dense small object detection algorithm is proposed for the problem of low accuracy of dense small object detection in UAV view. First, combining Spatial Attention Module (SAM) with Channel Attention Module (CAM), the fully connected layer after feature compression in CAM is improved to reduce the computational effort. In addition, the connection structure of CAM and SAM is changed to improve the spatial dimensional feature capture capability. In summary, a Spatial-Channel Attention Module (SCAM) is proposed to improve the model's attention to the aggregated regions of small targets in the feature map; secondly, an SCAM- based Attentional Feature Fusion module (SC-AFF) is proposed to enhance the feature fusion efficiency of small targets by adaptively assigning attentional weights according to feature maps of different scales; finally, a backbone network is introduced in the Transformer in the backbone network, and use the SC-AFF to improve the feature fusion at the original residual connections to better capture global information and rich contextual information, and improve the feature extraction capability of dense small targets in complex backgrounds. Experiments are conducted on the VisDrone2021 dataset. The effects of different network scale parameters and different input resolutions on the detection accuracy and speed of YOLOv5 are first investigated. The analysis concludes that YOLOv5s is more suitable to be used as a benchmark model for UAV real-time object detection. Under the benchmark of YOLOv5s, the improved model improves mAP50 by 6.4% and mAP75 by 5.8%, and the FPS for high-resolution images can reach 46. The mAP50 of the model trained at an input resolution of 1504×1504 can reach 54.5%, which is 11.5% better than that of YOLOv4. The accuracy is improved while the detection speed FPS remains at 46, which is more suitable for real-time UAV object detection in dense small target scenarios.

Key words: UAV, small object detection, attention mechanism, self-attention mechanism, feature fusion

CLC Number: