ACTA AERONAUTICAET ASTRONAUTICA SINICA >
Multi⁃scale modality fusion network based on adaptive memory length
Received date: 2023-05-08
Revised date: 2023-05-30
Accepted date: 2023-07-11
Online published: 2023-07-28
Supported by
National Natural Science Foundation of China(61501228)
To better utilize the complementary advantages of perceptions of point clouds and optical images in the field of autonomous driving, a dual-modal fusion network called MerNet is proposed. The network adopts a parallel encoding structure of point cloud features and optical image features. In each encoding stage, an optical image feature fusion module based on residual mapping and dilated dot attention mechanism is used to unidirectionally fuse the optical image features into the point cloud feature branch. A cascade hollow convolution module based on multi-scale dilated branches is proposed to enhance the context connections of the point cloud, and a bottleneck structure is used for the parallel branches to reduce the parameter amount of the context module. To further optimize the parameter update process, an optimization algorithm based on adaptive variable historical memory length is proposed, which considers the contribution value of historical gradients in different gradient trends. A collaborative loss function based on cross-entropy loss is studied. By cross-comparing the predicted labels of different modes and setting thresholds to screen the predicted features of the modes, the perception advantages of different sensors are highlighted. MerNet is trained and validated on the public dataset SemanticKITTI. The experimental results show that the proposed dual-modal network can effectively improve the performance of semantic segmentation and make the algorithm pay more attention to highly dangerous dynamic objects in the driving environment. In addition, the proposed context module can also reduce the parameter amount by 64.89% and further improve the efficiency of the algorithm.
Xiaohang LI , Jianjiang ZHOU . Multi⁃scale modality fusion network based on adaptive memory length[J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023 , 44(22) : 628977 -628977 . DOI: 10.7527/S1000-6893.2023.28977
1 | 彭冬亮, 文成林, 薛安克. 多传感器多源信息融合理论及应用[M]. 北京: 科学出版社, 2010. |
PENG D L, WEN C L, XUE A K. Theory and application of multi-sensor and multi-source information fusion[M]. Beijing: Science Press, 2010 (in Chinese). | |
2 | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs[DB/OL]. arXiv preprint: 1412.7062, 2014. |
3 | CHEN L C, PAPANDREOU G, KOKKINOS I, et al. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018, 40(4): 834-848. |
4 | CHEN L C, PAPANDREOU G, SCHROFF F, et al. Rethinking atrous convolution for semantic image segmentation[DB/OL]. arXiv preprint: 1706.05587, 2017. |
5 | CORTINHAL T, TZELEPIS G, AKSOY E E. SalsaNext: Fast, uncertainty-aware semantic segmentation of LiDAR point clouds for autonomous driving[DB/OL]. arXiv preprint: 2003.03653, 2020. |
6 | AKSOY E E, BACI S, CAVDAR S. SalsaNet: Fast road and vehicle segmentation in LiDAR point clouds for autonomous driving[C]∥ 2020 IEEE Intelligent Vehicles Symposium (IV). Piscataway: IEEE Press, 2021: 926-932. |
7 | VAN GANSBEKE W, NEVEN D, DE BRABANDERE B, et al. Sparse and noisy LiDAR completion with RGB guidance and uncertainty[C]∥ 2019 16th International Conference on Machine Vision Applications (MVA). Piscataway: IEEE Press, 2019: 1-6. |
8 | MEYER G P, CHARLAND J, HEGDE D, et al. Sensor fusion for joint 3D object detection and semantic segmentation[C]∥ 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). Piscataway: IEEE Press, 2020: 1230-1237. |
9 | CORTINHAL T, KURNAZ F, AKSOY E E. Semantics-aware multi-modal domain translation: From LiDAR point clouds to panoramic color images[C]∥ 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). Piscataway: IEEE Press, 2021: 3032-3041. |
10 | RUDER S. An overview of gradient descent optimization algorithms[DB/OL]. arXiv preprint: 1609.04747, 2016. |
11 | KINGMA D P, BA J. Adam: A method for stochastic optimization[J]. arXiv preprint: 1412.6980, 2014. |
12 | LUO L C, XIONG Y H, LIU Y, et al. Adaptive gradient methods with dynamic bound of learning rate[DB/OL]. arXiv preprint:1902.09843, 2019. |
13 | DING J B, REN X C, LUO R X, et al. An adaptive and momental bound method for stochastic learning[DB/OL]. arXiv preprint:1910.12249, 2019. |
14 | JADON S. A survey of loss functions for semantic segmentation[C]∥ 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). Piscataway: IEEE Press, 2020: 1-7. |
15 | XIE S N, TU Z W. Holistically-nested edge detection[C]∥ 2015 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2016: 1395-1403. |
16 | LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]∥ 2017 IEEE International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2017: 2999-3007. |
17 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2016: 770-778. |
18 | BERMAN M, TRIKI A R, BLASCHKO M B. The lovasz-softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks[C]∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4413-4421. |
19 | ISLAM M A, ROCHAN M, BRUCE N D B, et al. Gated feedback refinement network for dense image labeling[C]∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Piscataway: IEEE Press, 2017: 4877-4885. |
20 | BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal speed and accuracy of object detection[DB/OL]. arXiv preprint: 2004.10934, 2020. |
21 | SANDLER M, HOWARD A, ZHU M L, et al. MobileNetV2: Inverted residuals and linear bottlenecks[C]∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE Press, 2018: 4510-4520. |
22 | BEHLEY J, GARBADE M, MILIOTO A, et al. SemanticKITTI: A dataset for semantic scene understanding of LiDAR sequences[C]∥ 2019 IEEE/CVF International Conference on Computer Vision (ICCV). Piscataway: IEEE Press, 2020: 9296-9306. |
23 | ALONSO I, RIAZUELO L, MONTESANO L, et al. 3D-MiniNet: Learning a 2D representation from point clouds for fast and efficient 3D LIDAR semantic segmentation[J]. IEEE Robotics and Automation Letters, 2020, 5(4): 5432-5439. |
24 | WANG S, ZHU J K, ZHANG R X. Meta-RangeSeg: LiDAR sequence semantic segmentation using multiple feature aggregation[J]. IEEE Robotics and Automation Letters, 2022, 7(4): 9739-9746. |
25 | MILIOTO A, VIZZO I, BEHLEY J, et al. RangeNet: Fast and accurate LiDAR semantic segmentation[C]∥ 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Piscataway: IEEE Press, 2020: 4213-4220. |
26 | ZHAO Y H, WANG J, LI X L, et al. Number-adaptive prototype learning for 3D point cloud semantic segmentation[C]∥ European Conference on Computer Vision. Cham: Springer, 2023: 695-703. |
27 | XU C F, WU B C, WANG Z N, et al. SqueezeSegV3: Spatially-adaptive convolution for efficient point-cloud segmentation[C]∥ European Conference on Computer Vision. Cham: Springer, 2020: 1-19. |
28 | WANG J L, SUN B, LU Y. MVPNet: Multi-view point regression networks for 3D object reconstruction from A single image[J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2019, 33(1): 8949-8956. |
29 | KOCHANOV D, NEJADASL F K, BOOIJ O. KPRNet: Improving projection-based LiDAR semantic segmentation[DB/OL]. arXiv preprint: 2007.12668, 2020. |
30 | GENOVA K, YIN X Q, KUNDU A, et al. Learning 3D semantic segmentation with only 2D image supervision[C]∥ 2021 International Conference on 3D Vision (3DV). Piscataway: IEEE Press, 2022: 361-372. |
/
〈 |
|
〉 |