ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Small target detection algorithm for UAV based on patch⁃wise co⁃attention
Received date: 2023-06-12
Revised date: 2023-06-27
Accepted date: 2023-08-09
Online published: 2023-09-13
Supported by
China Aerospace Science and Technology Group Co., Ltd. Major Civil Independent R&D Project(YF-ZZYF-M-2022-019)
To address the problems of insufficient feature extraction and low detection accuracy of conventional target detection algorithms in the small target detection task of UAVs, a small target detection algorithm for UAVs is proposed based on Patch-Wise Co-Attention (PWCA). Firstly, a plug-and-play PWCA is proposed. The input feature is divided into patches in spatial dimension, and channel attention weights are extracted from the patches to enhance the discrimination of channel information in terms of local spatial features, so as to improve the network granularity in small target detection scenarios. Then, the input feature and the focused feature are fused, spatial attention is further extracted, and the effective feature information in the network is paid attention to. Secondly, the step-wise convolutional downsampling in the baseline grid is abandoned, and Adaptive Interlace Downsampling (AID) is proposed in combination with PWCA to distribute the weight of the features after downsampling according to the significance of features, and to reduce the information loss of small targets during the downsampling process. Finally, the lightweight design of the backbone and feature fusion network is carried out to reduce the calculation cost. A large-scale feature map detection branch for small targets is added to optimize the flow direction of feature maps. The semantic information of feature maps in different scales is enriched and the feature expression capability is enhanced to ensure real-time performance. The Soft-NMS algorithm is used to solve the problem of missing detection when the target is occluded and overlapped to improve the performance. The effectiveness of the proposed algorithm is evaluated on the VisDrone2019 dataset. The final mAP0.5 of the improved algorithm is 11.81% better than that of the YOLOv5s baseline algorithm, and the mAP0.5:0.95 is 10.91% better, while the model parameters are reduced by 59%. The proposed network can effectively balance detection accuracy and inference speed for UAV small target detection tasks, demonstrating its practical significance.
Key words: UAV; YOLOv5; small target detection; attention mechanism; downsampling; feature fusion
Aoze YU , Weiwei WEI , Ping WANG , Jinqiang ZHANG , Wenxiong KE . Small target detection algorithm for UAV based on patch⁃wise co⁃attention[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2024 , 45(14) : 629148 -629148 . DOI: 10.7527/S1000-6893.2023.29148
1 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. |
2 | DUAN K W, BAI S, XIE L X, et al. CenterNet: Keypoint triplets for object detection[C]∥ 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2019: 6568-6577. |
3 | CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with transformers[C]∥ 16th European Conference on Computer Vision. Cham: Springer, 2020: 213-229. |
4 | REDMON J, DIVVALA S, GIRSHICK R, et al. You Only Look Once: Unified, real-time object detection[C]∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 779-788. |
5 | 江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4): 524519. |
JIANG B, QU R K, LI Y D, et al. Object detection in UAV imagery based on deep learning: Review[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524519 (in Chinese). | |
6 | YIN G X, YU M, WANG M, et al. Research on highway vehicle detection based on faster R-CNN and domain adaptation[J]. Applied Intelligence, 2022, 52(4): 3483-3498. |
7 | REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 6517-6525. |
8 | REDMON J, FARHADI A. YOLOv3: An incremental improvement[DB/OL]. arXiv preprint:1804.02767, 2018. |
9 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[DB/OL]. arXiv preprint:2004.10934, 2020. |
10 | 李红光, 于若男, 丁文锐. 基于深度学习的小目标检测研究进展[J]. 航空学报, 2021, 42(7): 024691. |
LI H G, YU R N, DING W R. Research development of small object traching based on deep learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(7): 024691 (in Chinese). | |
11 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]∥ 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2017: 2999-3007. |
12 | GAI R L, CHEN N, YUAN H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model[J]. Neural Computing and Applications, 2023, 35(19): 13895-13906. |
13 | JIANG Z C, ZHAO L Q, LI S Y, et al. Real-time object detection method based on improved YOLOv4-tiny[DB/OL]. arXiv preprint: 2011.04244,2020. |
14 | NEUBECK A, VAN GOOL L. Efficient non-maximum suppression[C]∥ 18th International Conference on Pattern Recognition (ICPR'06). Piscataway: IEEE Press, 2006: 850-855. |
15 | BODLA N, SINGH B, CHELLAPPA R, et al. Soft-NMS—improving object detection with one line of code[C]∥ 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2017: 5562-5570. |
16 | WOO S, PARK J, LEE J Y, et al. CBAM: Convolutional block attention module[C]∥ European Conference on Computer Vision. Cham: Springer, 2018: 3-19. |
17 | SARVAMANGALA D R, KULKARNI R V. Convolutional neural networks in medical image understanding: A survey[J]. Evolutionary Intelligence, 2022, 15(1): 1-22. |
18 | SUNKARA R, LUO T. No more strided convolutions or pooling: A new CNN building block for low-resolution images and small objects[C]∥ Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Cham: Springer, 2023: 443-459. |
19 | LIN T Y, DOLLáR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 936-944. |
20 | ZHU L L, GENG X, LI Z, et al. Improving YOLOv5 with attention mechanism for detecting boulders from planetary images[J]. Remote Sensing, 2021, 13(18): 3776. |
21 | DU D W, ZHU P F, WEN L Y, et al. VisDrone-DET2019: The vision meets drone object detection in image challenge results[C]∥ 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). Piscataway: IEEE Press,2019: 213-226. |
22 | ZHU X Z, SU W J, LU L W, et al. Deformable DETR: Deformable transformers for end-to-end object detection[DB/OL]. arXiv preprint: 2010.04159, 2020. |
23 | CAI Z W, VASCONCELOS N. Cascade R-CNN: Delving into high quality object detection[C]∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 6154-6162. |
24 | LI Z M, PENG C, YU G, et al. Light-head R-CNN: In defense of two-stage object detector[DB/OL]. arXiv preprint:1711.07264,2017. |
/
〈 |
|
〉 |