Review

Object detection in UAV imagery based on deep learning: Review

  • JIANG Bo ,
  • QU Ruokun ,
  • LI Yandong ,
  • LI Chenglong
Expand
  • Civil Aviation Flight University of China, Guanghan 618307, China

Received date: 2020-07-09

  Revised date: 2020-07-20

  Online published: 2020-08-17

Supported by

Sichuan Province Science and Technology Plan-Key Research and Development (2019YFG0308); Sichuan Province Talents Fostering Quality and Teaching Reform Program of Higher Education in 2018-2020 (JG2018-325); Sichuan province College Students' Innovative Entrepreneurial Training Plan Program (S202010624029); Research Program of Civil Aviation Flight University of China (J2008-78); General Program by Civil Aviation Flight University of China (J2020-078)

Abstract

Object detection is one of the key technologies in improving the autonomous sensing ability of Unmanned Aerial Vehicles (UAVs). Research on object detection is of critical significance in UAV applications. Compared with traditional methods based on manual features, deep learning based on the convolutional neural network has a powerful capability of feature learning and expression, therefore becoming the mainstream algorithm in object detection. In recent years, object detection research has achieved a series breakthrough in the field of natural scene and the research in UAVs has increasingly become a hotspot simultaneously. This paper reviews the research progress of object detection algorithms based on deep learning, summarizing their advantages and disadvantages. Then, some typical aerial image datasets and the method of transfer learning are introduced, and relevant algorithms are analyzed aiming at the complex background, small and rotating objects, large fields of view in UAV imagery. The existing problems and possible future development directions are finally discussed.

Cite this article

JIANG Bo , QU Ruokun , LI Yandong , LI Chenglong . Object detection in UAV imagery based on deep learning: Review[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2021 , 42(4) : 524519 -524519 . DOI: 10.7527/S1000-6893.2020.24519

References

[1] 朱华勇, 牛轶峰, 沈林成, 等. 无人机系统自主控制技术研究现状与发展趋势[J].国防科技大学学报,2010,32(3):115-120. ZHU H Y, NIU Y F, SHEN L C, et al. State of the art and trends of autonomous control of UAV systems[J]. Journal of National University of Defense Technology, 2010,32(3):115-120(in Chinese).
[2] 宋闯, 赵佳佳, 王康, 等. 面向智能感知的小样本学习研究综述[J].航空学报,2020,41(S2):723756. SONG C, ZHAO J J, WANG K, et al. Few shot learning based intelligent perception:A survey[J].Acta Aeronautica et Astronautica Sinica, 2020,41(S2):723756(in Chinese).
[3] 李诚龙, 屈文秋, 李彦冬, 等. 面向eVTOL航空器的城市空中运输交通管理综述[J].交通运输工程学报,2020,20(4):35-54. LI C L, QU W Q, LI Y D, et al. Overview on traffic management of urban air mobility(UAM) with eVTOL aircraft[J]. Journal of Traffic and Transportation Engineering,2020,20(4):35-54(in Chinese).
[4] 石叶楠, 郑国磊. 三种用于加工特征识别的神经网络方法综述[J].航空学报,2019,40(9):182-198. SHI Y N, ZHENG G L. A review of three neural network methods for manufacturing feature recognition[J]. Acta Aeronautica et Astronautica Sinica, 2019,40(9):182-198(in Chinese).
[5] 李彦冬, 郝宗波, 雷航. 卷积神经网络研究综述[J].计算机应用,2016,36(9):2508-2515,2565. LI Y D, HAO Z B, LEI H. Survey of convolutional neural network[J]. Journal of Computer Applications, 2016,36(9):2508-2515,2565(in Chinese).
[6] LOWE D G. Distinctive image features from scale-invariant keypoints[J]. International Journal of Computer Vision, 2004, 60(2):91-110.
[7] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//CVPR 2005:Proceedings of the 2005 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2005:886-893.
[8] LECUN Y, BOTTOU L. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324.
[9] KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2):1097-1105.
[10] DENG J, DONG W, SOCHER R, et al. Imagenet:A large-scale hierarchical image database[C]//CVPR 2009:Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2009:248-255.
[11] LIN M, CHEN Q, YAN S. Network in network[DB/OL]. ArXiv Preprint:1312.4400, 2013.
[12] ZEILER M D, FERGUS R. Visualizing and understanding convolutional networks[C]//ECCV 2014:2014 European Conference on Computer Vision. Berlin:Springer, 2014:818-833.
[13] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[DB/OL]. ArXiv Preprint:1409.1556, 2014.
[14] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//CVPR 2015:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2015:1-9.
[15] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//CVPR 2016:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2016:770-778.
[16] HUANG G, LIU Z, DER MAATEN L V, et al. Densely connected convolutional networks[C]//CVPR 2017:Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2017:2261-2269.
[17] HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks[C]//CVPR 2018:Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2018:7132-7141.
[18] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//CVPR 2014:Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2014:580-587.
[19] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[20] GIRSHICK R. Fast R-CNN[C]//CVPR 2015:Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2015:1440-1448.
[21] REN S, HE K, GIRSHICK R, et al. Faster R-CNN:Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6):1137-1149.
[22] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once:Unified, real-time object detection[C]//CVPR 2016:Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2016:779-788.
[23] LIU W, ANGUELOV D, ERHAN D, et al. SSD:Single shot multibox detector[C]//ECCV 2016:2016 European Conference on Computer Vision. Berlin:Springer, 2016:21-37.
[24] LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft coco:Common objects in context[C]//ECCV 2014:2014 European Conference on Computer Vision. Berlin:Springer, 2014:740-755.
[25] EVERINGHAM M, ESLAMI S M, VAN GOOL L, et al. The pascal visual object classes challenge:A retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136.
[26] YANG Y, NEWSAM S. Bag-of-visual-words and spatial extensions for land-use classification[C]//Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems. New York:Association for Computing Machinery, 2010:270-279.
[27] CHENG G, ZHOU P, HAN J, et al. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images[J]. IEEE Transactions on Geoscience and Remote Sensing, 2016, 54(12):7405-7415.
[28] RAZAKARIVONY S, JURIE F. Vehicle detection in aerial imagery[J]. Journal of Visual Communication and Image Representation, 2016,34(C):187-203.
[29] ZHU H, CHEN X, DAI W, et al. Orientation robust object detection in aerial images using deep convolutional neural network[C]//2015 IEEE International Conference on Image Processing. Piscataway, NJ:IEEE Press, 2015:3735-3739.
[30] XIA G S, BAI X, DING J, et al. DOTA:A large-scale dataset for object detection in aerial images[C]//CVPR 2018:Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2018:3974-3983.
[31] ROBICQUET A, SADEGHIAN A, ALAHI A, et al. Learning social etiquette:Human trajectory understanding in crowded scenes[C]//ECCV 2016:2016 European Conference on Computer Vision. Berlin:Springer, 2016:549-565.
[32] BAREKATAIN M, MARTI M, SHIH H, et al. Okutama-Action:an aerial view video dataset for concurrent human action detection[C]//CVPR 2017:Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2017:2153-2160.
[33] HSIEH M, LIN Y, HSU W H, et al. Drone-based object counting by spatially regularized regional proposal network[C]//ICCV 2017:Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE, 2017:4165-4173.
[34] ZHU P, WEN L, BIAN X, et al. Vision meets drones:a challenge[DB/OL]. ArXiv Preprint:1804.07437, 2018.
[35] ZHU P, SUN Y, WEN L, et al. Drone based RGBT vehicle detection and counting:a challenge[DB/OL]. ArXiv Preprint:2003.02437, 2020.
[36] TORRALBA A, EFROS A A. Unbiased look at dataset bias[C]//CVPR 2011:Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2011:1521-1528.
[37] PAN S J, YANG Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(10):1345-1359.
[38] PAN B, TAI J, ZHENG Q, et al. Cascade convolutional neural network based on transfer-learning for aircraft detection on high-resolution remote sensing images[J]. Journal of Sensors, 2017:1-14.
[39] 袁功霖, 侯静, 尹奎英. 基于迁移学习与图像增强的夜间航拍车辆识别方法[J].计算机辅助设计与图形学学报,2019,31(3):467-473. YUAN G L, HOU J, YIN K Y. Night-time aerial image vehicle recognition technology based on transfer learning and image enhancement[J]. Journal of Computer-Aided Design & Computer Graphics, 2019,31(3):467-473(in Chinese).
[40] 王泽隆, 徐向辉, 张雷. 基于仿真SAR图像深度迁移学习的自动目标识别[J].中国科学院大学学报,2020,37(4):516-524. WANG Z L, XU X H, ZHANG L. Study of deep transfer learning for SAR ATR based on simulated SAR images[J]. Journal of University of Chinese Academy of Sciences, 2020,37(4):516-524(in Chinese).
[41] ZAMIR A R, SAX A, SHEN W B, et al. Taskonomy:disentangling task transfer learning[C]//CVPR 2018:Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2018:3712-3722.
[42] YOSINSKI J, CLUNE J, BENGIO Y, et al. How transferable are features in deep neural networks?[C]//International Conference on Neural Information Processing Systems, Cambridge:MIT Press, 2014:3320-3328.
[43] AUDEBERT N, SAUX B L, LEFEVRE S, et al. Segment-before-detect:vehicle detection and classification through semantic segmentation of aerial images[J]. Remote Sensing, 2017, 9(4):368-386.
[44] HE K, GKIOXARI G, DOLLAR P, et al. Mask R-CNN[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(2):386-397.
[45] CHEN L C, HERMANS A, PAPANDREOU G, et al. MaskLab:instance segmentation by refining object detection with semantic and direction features[C]//CVPR 2018:Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2018:4013-4022.
[46] CHEN K, PANG J, WANG J, et al. Hybrid task cascade for instance segmentation[C]//CVPR 2019:Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2019:4974-4983.
[47] LI C, XU C, CUI Z, et al. Learning object-wise semantic representation for detection in remote sensing imagery[C]//CVPR 2019:Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2019:20-27.
[48] CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[DB/OL]. ArXiv Preprint:1706.05587, 2017.
[49] 张瑞倩, 邵振峰, ALEKSEI P, 等. 多尺度空洞卷积的无人机影像目标检测方法[J].武汉大学学报(信息科学版),2020,45(6):895-903. ZHANG R Q, SHAO Z F, ALEKSEI PORTNOV, et al. Multi? scale dilated convolutional neural network for object detection in UAV images[J]. Geomatics and Information Science of Wuhan University, 2020,45(6):895-903(in Chinese).
[50] YANG X, YANG J, YAN J, et al. SCRDet:towards more robust detection for small, cluttered and rotated objects[C]//ICCV 2018:Proceedings of the 2018 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2019:8232-8241.
[51] SEVO I, AVRAMOVIC A. Convolutional neural network based automatic object detection on aerial images[J]. IEEE Geoscience and Remote Sensing Letters, 2016, 13(5):740-744.
[52] SOMMER L W, SCHUCHERT T, BEYERER J. Fast deep vehicle detection in aerial images[C]//WACV 2017:2017 IEEE Winter Conference on Applications of Computer Vision. Washington, D.C.:IEEE Computer Society, 2017:311-319.
[53] LIN T Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//CVPR 2017:Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2017:2117-2125.
[54] AZIMI S M, VIG E, BAHMANYAR R, et al. Towards multi-class object detection in unconstrained remote sensing imagery[C]//Asian Conference on Computer Vision. Berlin:Springer, 2018:150-165.
[55] DAI J, QI H, XIONG Y, et al. Deformable convolutional networks[C]//ICCV 2017:Proceedings of the 2017 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2017:764-773.
[56] YANG X, SUN H, FU K, et al. Automatic ship detection in remote sensing images from google earth of complex scenes based on multiscale rotation dense feature pyramid networks[J]. Remote Sensing, 2018, 10(1):132-146.
[57] WANG J, DING J, GUO H, et al. Mask OBB:A semantic attention-based mask oriented bounding box representation for multi-category object detection in aerial images[J]. Remote Sensing, 2019, 11(24):2930-2951.
[58] 刘芳, 吴志威, 杨安喆, 等. 基于多尺度特征融合的自适应无人机目标检测[J].光学学报,2020,40(10):133-142. LIU F, WU Z W, YANG A Z, et al. Multi-scale feature fusion based adaptive object detection for UAV[J]. Acta Optica Sinaica, 2020,40(10):133-142(in Chinese).
[59] HE K, GIRSHICK R, DOLLáR P. Rethinking imagenet pre-training[C]//ICCV 2019:Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2019:4918-4927.
[60] ZHU R, ZHANG S, WANG X, et al. ScratchDet:Rraining single-shot object detectors from scratch[C]//CVPR 2019:Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2019:2268-2277.
[61] WANG T, ANWER R M, CHOLAKKAL H, et al. Learning rich features at high-speed for single-shot object detection[C]//ICCV 2019:Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2019:1971-1980.
[62] YU X, GONG Y, JIANG N, et al. Scale match for tiny person detection[C]//WACV 2020:2020 IEEE Winter Conference on Applications of Computer Vision. Washington, D.C.:IEEE Computer Society, 2020:1257-1265.
[63] 刘颖, 刘红燕, 范九伦, 等. 基于深度学习的小目标检测研究与应用综述[J].电子学报,2020,48(3):590-601. LIU Y, LIU H Y, FAN J L, et al. A survey of research and application of small object detection based on deep learning[J]. Acta Electronica Sinica, 2020,48(3):590-601(in Chinese).
[64] LALONDE R, ZHANG D, SHAH M. ClusterNet:detecting small objects in large scenes by exploiting spatio-temporal information[C]//CVPR 2018:Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2018:4003-4012.
[65] YANG F, FAN H, CHU P, et al. Clustered object detection in aerial images[C]//ICCV 2019:Proceedings of the 2019 IEEE International Conference on Computer Vision. Piscataway:IEEE Press, 2019:8311-8320.
[66] GAO M, YU R, LI A, et al. Dynamic zoom-in network for fast object detection in large images[C]//CVPR 2018:Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2018:6926-6935.
[67] UZKENT B, YEH C, ERMON S. Efficient object detection in large images using deep reinforcement learning[C]//WACV 2020:2020 IEEE Winter Conference on Applications of Computer Vision. Washington, D.C.:IEEE Computer Society, 2020:1824-1833.
[68] BOČICSTULIC D, MARUSIC Č, GOTOVAC S, et al. Deep learning approach in aerial imagery for supporting land search and rescue missions[J]. International Journal of Computer Vision, 2019, 127(9):1256-1278.
[69] JIANG Y, ZHU X, WANG X, et al. R2CNN:Rotational region CNN for orientation robust scene text detection[DB/OL]. ArXiv Preprint:1706.09579, 2017.
[70] XU Y, FU M, WANG Q, et al. Gliding vertex on the horizontal bounding box for multi-oriented object detection[DB/OL]. ArXiv Preprint:1911.09358v2, 2020.
[71] MA J, SHAO W, YE H, et al. Arbitrary-oriented scene text detection via rotation proposals[J]. IEEE Transactions on Multimedia, 2018, 20(11):3111-3122.
[72] DING J, XUE N, LONG Y, et al. Learning ROI transformer for oriented object detection in aerial images[C]//CVPR 2019:Proceedings of the 2019 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2019:2849-2858.
[73] ZHOU X, WANG D, KRAHENBUHL P, et al. Objects as points[DB/OL]. ArXiv Preprint:1904.07850, 2019.
[74] PAN X, REN Y, SHENG K, et al. Dynamic refinement network for oriented and densely packed object detection[C]//CVPR 2020:Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2020:11207-11216.
[75] CHENG G, HAN J, ZHOU P, et al. Learning rotation-invariant and fisher discriminative convolutional neural networks for object detection[J]. IEEE Transactions on Image Processing, 2019, 28(1):265-278.
[76] YANG M, YU K, ZHANG C, et al. DenseASPP for semantic segmentation in street scenes[C]//CVPR 2018:Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2018:3684-3692.
[77] GUO C, FAN B, ZHANG Q, et al. AugFPN:Improving multi-scale feature learning for object detection.[C]//CVPR 2020:Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2020:12595-12604.
[78] DONG Z, LI G, LIAO Y, et al. CentripetalNet:Pursuing high-quality keypoint pairs for object detection.[C]//CVPR 2020:Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. Washington, D.C.:IEEE Computer Society, 2020:10519-10528.
Outlines

/