基于梯度差自适应学习率优化的改进YOLOX目标检测算法
收稿日期: 2022-08-29
修回日期: 2022-10-18
录用日期: 2022-11-09
网络出版日期: 2022-11-17
基金资助
国家自然科学基金(62033010);航空科学基金(2019460T5001)
Improved YOLOX object detection algorithm based on gradient difference adaptive learning rate optimization
Received date: 2022-08-29
Revised date: 2022-10-18
Accepted date: 2022-11-09
Online published: 2022-11-17
Supported by
National Natural Science Foundation of China(62033010);Aeronautical Science Foundation of China(2019460T5001)
目标检测一直都是计算机视觉领域最具挑战的问题之一,其广泛应用于人脸识别、自动驾驶和交通检测等任务中。为更进一步提升当前主流目标检测算法的性能表现,提出了基于YOLOX的目标检测改进算法,并在标准的PASCAL VOC 07+12和RSOD数据集上进行实验验证。针对YOLOX目标检测算法主要通过数据增强、改进网络结构和损失函数3方面做出改进,同时提出基于梯度差的自适应学习率优化算法用于训练改进后的YOLOX算法,该优化算法同样适用于其他神经网络优化。在PASCAL VOC 07+12标准数据集上进行实验,与原YOLOX-S进行比较,改进后的YOLOX-S算法的AP由61.63%提升到66.35%,提升效果明显。同时在RSOD标准数据集上进行实验,并与其他主流的YOLO系列算法进行了比较,改进后的YOLOX-S算法在RSOD数据集的AP由69.4%提升到73.2%,提升效果显著。实验表明:针对YOLOX的目标检测做出改进是有效的。
关键词: 目标检测; YOLOX; 神经网络优化; PASCAL VOC; RSOD
宋玉存 , 葛泉波 , 朱军龙 , 陆振宇 . 基于梯度差自适应学习率优化的改进YOLOX目标检测算法[J]. 航空学报, 2023 , 44(14) : 327951 -327951 . DOI: 10.7527/S1000-6893.2022.27951
Object detection has always been one of the most challenging problems in the field of computer vision, and is widely used in the tasks such as face recognition, autonomous driving and traffic detection. To further improve the performance of current mainstream object detection algorithms, this paper proposes an improved object detection algorithm based on YOLOX, and carries out experiments on the standard PASCAL VOC 07+12 and RSOD datasets. The YOLOX object detection algorithm is improved mainly through data enhancement, improving network structure and loss function. At the same time, an adaptive learning rate optimization algorithm based on gradient difference is proposed to train the improved YOLOX algorithm, which is also suitable for optimization of other neural networks. Experiments are carried out on PASCAL VOC 07+12 standard data sets. Results show that the AP of the improved YOLOX-S algorithm is increased from 61.63% to 66.35% compared with that of the original YOLOX-S algorithm. The improvement effect is obvious. Experiments are also carried out on the RSOD standard data set. The results show that the AP of the improved YOLOX-S algorithm is increased from 69.4% to 73.2% on the RSOD data set, compared with those of other mainstream YOLO series algorithms. The improvement effect is also significant. Experiments show effective improvement of YOLOX’s object detection.
Key words: object detection; YOLOX; neural network optimization; PASCAL VOC; RSOD
1 | 李柯泉, 陈燕, 刘佳晨, 等. 基于深度学习的目标检测算法综述[J]. 计算机工程, 2022, 48(7): 1-12. |
LI K Q, CHEN Y, LIU J C, et al. Survey of deep learning-based object detection algorithms[J]. Computer Engineering, 2022, 48(7): 1-12 (in Chinese). | |
2 | GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 580-587. |
3 | HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. |
4 | GIRSHICK R. Fast R-CNN[C]∥ 2015 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2016: 1440-1448. |
5 | REN S Q, HE K M, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. |
6 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 779-788. |
7 | LIU W, ANGUELOV D, ERHAN D, et al. SSD: Single shot MultiBox detector[C]∥European Conference on Computer Vision. Cham: Springer, 2016: 21-37. |
8 | 李红光, 于若男, 丁文锐. 基于深度学习的小目标检测研究进展[J]. 航空学报, 2021, 42(7): 024691. |
LI H G, YU R N, DING W R. Research development of small object traching based on deep learning[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(7): 024691 (in Chinese). | |
9 | REDMON J, FARHADI A. YOLO9000: Better, faster, stronger[C]∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 6517-6525. |
10 | REDMON J, FARHADI A. YOLOv3: An incremental improvement[DB/OL]. arXiv preprint: 1804.02767, 2018. |
11 | BOCHKOVSKIY A, WANG C Y, LIAO H. YOLOv4: Optimal speed and accuracy of object detection[DB/OL]. arXiv preprint: 2004.10934, 2020. |
12 | JOCHER G, NISHIMURA K, MINEEVA T, et al. YOLOv5[EB/OL]. 2020. . |
13 | GE Z, LIU S T, WANG F, et al. YOLOX: Exceeding YOLO series in 2021[DB/OL]. arXiv preprint: 2107. 08430, 2021. |
14 | 江波, 屈若锟, 李彦冬, 等. 基于深度学习的无人机航拍目标检测研究综述[J]. 航空学报, 2021, 42(4): 524519. |
JIANG B, QU R K, LI Y D, et al. Object detection in UAV imagery based on deep learning: Review[J]. Acta Aeronautica et Astronautica Sinica, 2021, 42(4): 524519 (in Chinese). | |
15 | 葛泉波, 张建朝, 杨秦敏, 等. 带有微分项改进的自适应梯度下降优化算法[J]. 控制理论与应用, 2022, 39(4): 623-632. |
GE Q B, ZHANG J C, YANG Q M, et al. Adaptive gradient descent optimization algorithm with improved differential term[J]. Control Theory & Applications, 2022, 39(4): 623-632 (in Chinese). | |
16 | SINHA N K, GRISCIK M P. A stochastic approximation method[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1971, SMC-1(4): 338-344. |
17 | 杨晗. 深度学习中一阶优化算法研究[D]. 北京: 北京邮电大学, 2021: 10-19. |
YANG H. Research on first-order optimization algorithm in deep learning[D]. Beijing: Beijing University of Posts and Telecommunications, 2021: 10-19 (in Chinese). | |
18 | 刘克刚. 基于二阶信息的优化算法[D]. 上海: 华东师范大学, 2020: 7-13. |
LIU K G. Optimization algorithm based on second-order information[D]. Shanghai: East China Normal University, 2020: 7-13 (in Chinese). | |
19 | DUCHI J C, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12: 2121-2159. |
20 | ZEILER M D. ADADELTA: An adaptive learning rate method[DB/OL]. arXiv preprint: 1212.5701, 2012. |
21 | GRAVES A. Generating sequences with recurrent neural networks[DB/OL]. arXiv preprint: 1308.0850, 2013. |
22 | KINGMA D P, BA J. Adam: A method for stochastic optimization[DB/OL]. arXiv preprint: 1412.6980, 2014. |
23 | LUO L C, XIONG Y H, LIU Y, et al. Adaptive gradient methods with dynamic bound of learning rate[DB/OL]. arXiv preprint: 1902.09843, 2019. |
24 | ZHUANG J, TANG T, DING Y, et al. Adabelief optimi-zer: Adapting stepsizes by the belief in observed gradients[J]. Advances in Neural Information Processing Systems, 2020, 33: 18795-18806. |
25 | SHAO Z, LIN T. A new adaptive gradient method with gradient decomposition[DB/OL]. arXiv preprint: 2107.08377, 2021. |
26 | ZHANG H Y, CISSE M, DAUPHIN Y N, et al. Mixup: Beyond empirical risk minimization[DB/OL]. arXiv preprint: 1710.09412, 2017. |
27 | DUBEY S R, CHAKRABORTY S, ROY S K, et al. diffGrad: An optimization method for convolutional neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(11): 4500-4511. |
28 | ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11. |
29 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]∥ IEEE Transactions on Pattern Analysis and Machine Intelligence. Piscataway: IEEE Press, 2018: 318-327. |
30 | EVERINGHAM M, VAN GOOL L, WILLIAMS C K I, et al. The pascal visual object classes (VOC) challenge[J]. International Journal of Computer Vision, 2010, 88(2): 303-338. |
31 | GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]∥Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 2011: 315-323. |
32 | XIAO Z F, LIU Q, TANG G F, et al. Elliptic Fourier transformation-based histograms of oriented gradients for rotationally invariant object detection in remote-sensing images[J]. International Journal of Remote Sensing, 2015, 36(2): 618-644. |
33 | SUTSKEVER I, MARTENS J, DAHL G, et al. On the importance of initialization and momentum in deep learning[C]∥ Proceedings of the 30th International Conference on International Conference on Machine Learning-Volume 28. New York: ACM, 2013: III-1139-III. |
34 | LI C Y, LI L L, JIANG H L, et al. YOLOv6: A single-stage object detection framework for industrial applications[DB/OL]. arXiv preprint: 2209. 02976, 2022. |
35 | KRIZHEVSKY A. Learning multiple layers of features from tiny images: TR-2009[R]. Toronto: University of Toronto, 2009. |
36 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[DB/OL]. arXiv preprint: 1409.1556, 2014. |
37 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 770-778. |
38 | HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks[C]∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 2261-2269. |
/
〈 |
|
〉 |