Reviews

Survey on multi-task learning for object classification and recognition

  • LI Hongguang ,
  • WANG Fei ,
  • DING Wenrui
Expand
  • 1. Research Institute of Unmanned System, Beihang University, Beijing 100191, China;
    2. School of Electronics and Information Engineering, Beihang University, Beijing 100191, China

Received date: 2020-10-19

  Revised date: 2021-04-28

  Online published: 2021-04-27

Supported by

Surface Project of National Natural Science Foundation of China(62076019)

Abstract

Multi-Task Learning(MTL) aims to enhance the model performance by jointly leveraging supervisory signals and sharing useful information among multiple related tasks. This paper comprehensively summarizes and analyzes the mechanism and mainstream methods of multi-task learning for object classification and recognition applications. First, we review the definitions, principles and methods of MTL. Second, taking the representative and widely used fine-grained classification and object re-identification as examples, we emphatically introduce two types of multi-task learning for object classification and recognition: task-based multi-task learning and feature-based multi-task learning, and further categorize each type and analyze the design ideas, and advantages and disadvantages of different MTL algorithms. Third, we compare the performance of various MTL algorithms reviewed in this paper on common datasets. Finally, prospects on development trends of MTL algorithms for object classification and recognition are discussed.

Cite this article

LI Hongguang , WANG Fei , DING Wenrui . Survey on multi-task learning for object classification and recognition[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022 , 43(1) : 24889 -024889 . DOI: 10.7527/S1000-6893.2021.24889

References

[1] CARUANA R. Multitask learning[J]. Machine Learning, 1997, 28(1): 41-75.
[2] RUDER S. An overview of multi-task learning in deep neural networks[DB/OL]. arXiv preprint: 1706.05098, 2017.
[3] ZHANG Y, YANG Q. A survey on multi-task learning[DB/OL]. arXiv preprint: 1707.08114, 2017.
[4] 张钰, 刘建伟, 左信. 多任务学习[J]. 计算机学报, 2020, 43(7): 1340-1378. ZHANG Y, LIU J W, ZUO X. Survey of multi-task learning[J]. Chinese Journal of Computers, 2020, 43(7): 1340-1378(in Chinese).
[5] VANDENHENDE S, GEORGOULIS S, VAN GOOL L. MTI-net: Multi-scale task interaction networks for multi-task learning[M]//Computer Vision-ECCV 2020. Cham: Springer International Publishing, 2020: 527-543.
[6] ZHENG L, SHEN L Y, TIAN L, et al. Scalable person re-identification: A benchmark[C]//2015 IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2015: 1116-1124.
[7] LI W, ZHAO R, XIAO T, et al. DeepReID: Deep filter pairing neural network for person Re-identification[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 152-159.
[8] RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking[C]//Computer Vision-ECCV 2016 Workshops, 2016.
[9] WAH C, BRANSON S, WELINDER P, et al. The caltech-ucsd birds-200-2011 dataset: CNS-TR-2010-001[R]. California: California Institute of Technology, 2011.
[10] KRAUSE J, STARK M, JIA D, et al. 3D object representations for fine-grained categorization[C]//2013 IEEE International Conference on Computer Vision Workshops. Piscataway: IEEE Press, 2013: 554-561.
[11] BAKKER B, HESKES T. Task clustering and gating for Bayesian multitask learning[J]. Journal of Machine Learning Research, 2004, 4(1): 83-99.
[12] BAXTER J. A model of inductive bias learning[J]. Journal of Artificial Intelligence Research, 2000, 12: 149-198.
[13] COLLOBERT R, WESTON J. A unified architecture for natural language processing: Deep neural networks with multitask learning[C]//Proceedings of the 25th international conference on Machine learning-ICML’08. New York: ACM Press, 2008: 160-167.
[14] EATON E, DESJARDINS M, LANE T. Modeling transfer relationships between learning tasks for improved inductive transfer[C]//Machine Learning and Knowledge Discovery in Databases, 2008.
[15] LIANG Y X, LIU L B, XU Y, et al. Multi-task GLOH feature selection for human age estimation[C]//2011 18th IEEE International Conference on Image Processing. Piscataway: IEEE Press, 2011: 565-568.
[16] DUONG L, COHN T, BIRD S, et al. Low resource dependency parsing: Cross-lingual parameter sharing in a neural network parser[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. Stroudsburg: Association for Computational Linguistics, 2015: 845-850.
[17] YANG Y X, HOSPEDALES T M. Trace norm regularised deep multi-task learning[DB/OL]. arXiv preprint: 1606.04038, 2016.
[18] 余少勇. 基于深度学习的车辆检测及其细粒度分类关键技术研究[D]. 厦门: 厦门大学, 2017: 1-12. YU S Y. Research on key technologies of vehicle detection and its fine-grained classification based on deep learning[D]. Xiamen: Xiamen University, 2017: 1-12(in Chinese).
[19] XIE S N, YANG T B, WANG X Y, et al. Hyper-class augmented and regularized deep learning for fine-grained image classification[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2015: 2645-2654.
[20] ZHANG X F, ZHOU F, LIN Y Q, et al. Embedding label structures for fine-grained feature representation[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2016: 1114-1123.
[21] 王崇屹. 基于多任务学习的车辆重识别系统研究与实现[D]. 成都: 电子科技大学, 2019: 20-49. WANG C Y. Research and implementation of vehicle Re-identification system based on multi-task learning[D]. Chengdu: University of Electronic Science and Technology of China, 2019: 20-49(in Chinese).
[22] 姚乐炜. 基于深度学习的行人重识别算法研究[D]. 哈尔滨: 哈尔滨工业大学, 2018: 28-40. YAO L W. Research on person Re-identification based on deep learning methods[D]. Harbin: Harbin Institute of Technology, 2018: 28-40(in Chinese).
[23] CHEN W H, CHEN X T, ZHANG J G, et al. A multi-task deep network for person Re-identification[DB/OL]. arXiv preprint: 1607.05369, 2016.
[24] WANG C, ZHANG Q, HUANG C, et al. Mancs: A multi-task attentional network with curriculum sampling for person Re-identification[C]//Computer Vision-ECCV 2018, 2018.
[25] FU Y, LI X T, YE Y M. A multi-task learning model with adversarial data augmentation for classification of fine-grained images[J]. Neurocomputing, 2020, 377: 122-129.
[26] CHEN Y, BAI Y L, ZHANG W, et al. Destruction and construction learning for fine-grained image recognition[C]//2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2019: 5152-5161.
[27] LIN Y T, ZHENG L, ZHENG Z D, et al. Improving person re-identification by attribute and identity learning[J]. Pattern Recognition, 2019, 95: 151-161.
[28] TANG Z, NAPHADE M, BIRCHFIELD S, et al. PAMTRI: Pose-aware multi-task learning for vehicle Re-identification using highly randomized synthetic data[C]//2019 IEEE/CVF International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2019: 211-220.
[29] GEBRU T, HOFFMAN J, LI F F. Fine-grained recognition in the wild: A multi-task domain adaptation approach[C]//2017 IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2017: 1358-1367.
[30] 庄培钦. 基于深度学习的细粒度图像识别方法研究[D]. 深圳: 中国科学院大学(中国科学院深圳先进技术研究院), 2019: 15-39. ZHUANG P Q. A study on fine-grained image recognition with deep learning methods[D]. Shenzhen: Shenzhen Institutes of Advanced Technology,Chinese Academy of Sciences, 2019: 15-39(in Chinese).
[31] 孟宇. 基于深度学习的车辆细分类研究[D]. 徐州: 中国矿业大学, 2018: 15-39. MENG Y. Research on fine-grained vehicle classification based on deep learning[D]. Xuzhou: China University of Mining and Technology, 2018: 15-39(in Chinese).
[32] 陈娜. 交通监控视频中车辆重识别技术研究与实现[D]. 北京: 北京邮电大学, 2019: 17-40. CHEN N. Research and implementation of vehicle Re-identification in traffic monitoring video[D]. Beijing: Beijing University of Posts and Telecommunications, 2019: 17-40(in Chinese).
[33] BERG T, LIU J X, LEE S W, et al. Birdsnap: Large-scale fine-grained visual categorization of birds[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 2019-2026.
[34] MAJI S, RAHTU E, KANNALA J, et al. Fine-grained visual classification of aircraft[DB/OL]. arXiv preprint: 1306.5151, 2013.
[35] ZHANG N, DONAHUE J, GIRSHICK R, et al. Part-based R-CNNs for fine-grained category detection[C]//2014 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2014: 834-849.
[36] HUANG S L, XU Z, TAO D C, et al. Part-stacked CNN for fine-grained visual categorization[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2016: 1173-1182.
[37] PENG Y X, HE X T, ZHAO J J. Object-part attention model for fine-grained image classification[J]. IEEE Transactions on Image Processing, 2018, 27(3): 1487-1500.
[38] FU J L, ZHENG H L, MEI T. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2017: 4476-4484.
[39] RODRÍGUEZ P, GONFAUS J M, CUCURULL G, et al. Attend and rectify: A gated attention mechanism for fine-grained recovery[C]//Computer Vision-ECCV 2018, 2018.
[40] ZHAO B, WU X, FENG J S, et al. Diversified visual attention networks for fine-grained object classification[J]. IEEE Transactions on Multimedia, 2017, 19(6): 1245-1256.
[41] ZHENG H L, FU J L, MEI T, et al. Learning multi-attention convolutional neural network for fine-grained image recognition[C]//2017 IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2017: 5219-5227.
[42] LAYNE R, HOSPEDALES T, GONG S G. Re-id: Hunting attributes in the wild[C]//Proceedings of the British Machine Vision Conference 2014, 2014.
[43] 丁乐乐. 基于深度学习和强化学习的车辆定位与识别[D]. 成都: 电子科技大学, 2016: 1-20. DING L L. Vehicle location and identification based on deep learning and reinforcement learning[D]. Chengdu: University of Electronic Science and Technology of China, 2016: 1-20(in Chinese).
[44] 熊祎. 基于深度学习的车辆型号识别[D]. 武汉: 华中科技大学, 2014: 5-13. XIONG Y. Vehicle type recognition based on deep learning[D]. Wuhan: Huazhong University of Science and Technology, 2014: 5-13(in Chinese).
[45] 张飞云. 基于深度学习的车辆定位及车型识别研究[D]. 镇江: 江苏大学, 2016: 10-21. ZHANG F Y. Car detection and vehicle type classification based on deep learning[D]. Zhenjiang: Jiangsu University, 2016: 10-21(in Chinese).
[46] ZHENG H L, FU J L, MEI T, et al. Learning multi-attention convolutional neural network for fine-grained image recognition[C]//2017 IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2017: 5219-5227.
[47] LI W, ZHU X T, GONG S G. Person Re-identification by deep joint learning of multi-loss classification[DB/OL]. arXiv preprint: 1705.04724, 2017.
[48] LI W, ZHU X T, GONG S G. Harmonious attention network for person Re-identification[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 2285-2294.
[49] WANG F Q, ZUO W M, LIN L, et al. Joint learning of single-image and cross-image representations for person Re-identification[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2016: 1288-1296.
[50] DAI P Y, JI R R, WANG H B, et al. Cross-modality person Re-identification with generative adversarial training[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018: 2.
[51] SIMON M, RODNER E. Neural activation constellations: Unsupervised part model discovery with convolutional networks[C]//2015 IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2015: 1143-1151.
[52] ZHANG X P, XIONG H K, ZHOU W G, et al. Picking deep filter responses for fine-grained image recognition[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2016: 1134-1142.
[53] LIN T Y, ROYCHOWDHURY A, MAJI S. Bilinear CNN models for fine-grained visual recognition[C]//2015 IEEE International Conference on Computer Vision(ICCV). Piscataway: IEEE Press, 2015: 1449-1457.
[54] XIAO T J, XU Y C, YANG K Y, et al. The application of two-level attention models in deep convolutional neural network for fine-grained image classification[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2015: 842-850.
[55] LIAO S C, HU Y, ZHU X Y, et al. Person re-identification by local maximal occurrence representation and metric learning[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2015: 2197-2206.
[56] MATSUKAWA T, OKABE T, SUZUKI E, et al. Hierarchical Gaussian descriptor for person re-identification[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2016: 1363-1372.
[57] KVIATKOVSKY I, ADAM A, RIVLIN E. Color invariants for person reidentification[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(7): 1622-1634.
[58] XIAO T, LI H S, OUYANG W L, et al. Learning deep feature representations with domain guided dropout for person re-identification[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2016: 1249-1258.
[59] AHMED E, JONES M, MARKS T K. An improved deep learning architecture for person re-identification[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2015: 3908-3916.
[60] YANG Z, LUO T G, WANG D, et al. Learning to navigate for fine-grained classification[M]//Computer Vision-ECCV 2018. Cham: Springer International Publishing, 2018: 438-454.
[61] DING S Y, LIN L, WANG G R, et al. Deep feature learning with relative distance comparison for person re-identification[J]. Pattern Recognition, 2015, 48(10): 2993-3003.
[62] YI D, LEI Z, LIAO S C, et al. Deep metric learning for person Re-identification[C]//2014 22nd International Conference on Pattern Recognition. Piscataway: IEEE Press, 2014: 34-39.
[63] ZHANG R M, LIN L, ZHANG R, et al. Bit-scalable deep hashing with regularized similarity learning for image retrieval and person Re-identification[J]. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2015, 24(12): 4766-4779.
[64] LIU S, LIANG X D, LIU L Q, et al. Matching-CNN meets KNN: Quasi-parametric human parsing[C]//2015 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2015: 1419-1427.
[65] CHEN S Z, GUO C C, LAI J H. Deep ranking for person Re-identification via joint representation learning[J]. IEEE Transactions on Image Processing: A Publication of the IEEE Signal Processing Society, 2016, 25(5): 2353-2367.
[66] MISRA I, SHRIVASTAVA A, GUPTA A, et al. Cross-stitch networks for multi-task learning[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2016: 3994-4003.
[67] GAO S, WANG J Y, LU H C, et al. Pose-guided visible part matching for occluded person ReID[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Piscataway: IEEE Press, 2020: 11741-11749.
Outlines

/