基于自适应记忆长度的多尺度模态融合网络

李晓航; 周建江

doi:10.7527/S1000-6893.2023.28977

航空学报 >

2023 , Vol. 44 >Issue 22: 628977 - 628977

DOI: https://doi.org/10.7527/S1000-6893.2023.28977

专栏

基于自适应记忆长度的多尺度模态融合网络

李晓航 ,
周建江

展开

南京航空航天大学雷达成像与微波光子技术教育部重点实验室，南京 211100

．E-mail： zjjee@nuaa.edu.cn

收稿日期: 2023-05-08

修回日期: 2023-05-30

录用日期: 2023-07-11

网络出版日期: 2023-07-28

基金资助

国家自然科学基金(61501228)

收起

Multi⁃scale modality fusion network based on adaptive memory length

Xiaohang LI ,
Jianjiang ZHOU

Expand

Key Laboratory of Radar Imaging and Microwave Photonic Technology of Ministry of Education，Nanjing University of Aeronautics and Astronautics，Nanjing 211100，China

E-mail： zjjee@nuaa.edu.cn

Received date: 2023-05-08

Revised date: 2023-05-30

Accepted date: 2023-07-11

Online published: 2023-07-28

Supported by

National Natural Science Foundation of China(61501228)

Fold

摘要

为了更好地利用点云和光学图像在自动驾驶领域的互补感知优势，提出了一种双模态融合网络MerNet。网络采用点云特征和光学图像特征并行编码的结构，在每一个编码阶段通过基于残差映射和膨胀点注意力机制的融合模块将光学图像特征单向融合到点云特征支路。设计了一种多尺度膨胀支路的级联空洞卷积模块，以加强点云的上下文联系，并在并行支路上采用瓶颈结构来降低上下文模块的参数量。为进一步优化参数更新过程，提出了一种自适应变历史记忆长度的优化算法，考虑了不同梯度变化趋势下历史梯度的贡献值。研究了一种基于交叉熵损失的协同损失函数，通过交叉比对不同模态的预测标签，并通过设定阈值筛选对比模态的预测特征，突出不同传感器的感知优势。在公开数据集SemanticKITTI上对MerNet进行了训练和验证，实验结果表明：提出的双模态网络能够有效提升语义分割性能，并使算法更加关注驾驶环境中的高危险性动态目标。同时，提出的上下文模块还能够降低64.89%的参数量，进一步提高算法的效率。

关键词： 深度学习; 语义分割; 多模态; 特征融合; 注意力机制

本文引用格式

李晓航 , 周建江 . 基于自适应记忆长度的多尺度模态融合网络[J]. 航空学报, 2023 , 44(22) : 628977 -628977 . DOI: 10.7527/S1000-6893.2023.28977

Abstract

To better utilize the complementary advantages of perceptions of point clouds and optical images in the field of autonomous driving， a dual-modal fusion network called MerNet is proposed. The network adopts a parallel encoding structure of point cloud features and optical image features. In each encoding stage， an optical image feature fusion module based on residual mapping and dilated dot attention mechanism is used to unidirectionally fuse the optical image features into the point cloud feature branch. A cascade hollow convolution module based on multi-scale dilated branches is proposed to enhance the context connections of the point cloud， and a bottleneck structure is used for the parallel branches to reduce the parameter amount of the context module. To further optimize the parameter update process， an optimization algorithm based on adaptive variable historical memory length is proposed， which considers the contribution value of historical gradients in different gradient trends. A collaborative loss function based on cross-entropy loss is studied. By cross-comparing the predicted labels of different modes and setting thresholds to screen the predicted features of the modes， the perception advantages of different sensors are highlighted. MerNet is trained and validated on the public dataset SemanticKITTI. The experimental results show that the proposed dual-modal network can effectively improve the performance of semantic segmentation and make the algorithm pay more attention to highly dangerous dynamic objects in the driving environment. In addition， the proposed context module can also reduce the parameter amount by 64.89% and further improve the efficiency of the algorithm.

Key words： deep learning; semantic segmentation; multimodality; feature fusion; attention mechanism

参考文献

1	彭冬亮，文成林，薛安克. 多传感器多源信息融合理论及应用［M］. 北京：科学出版社， 2010.
	PENG D L， WEN C L， XUE A K. Theory and application of multi-sensor and multi-source information fusion［M］. Beijing： Science Press， 2010 （in Chinese）.
2	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs［DB/OL］. arXiv preprint： 1412.7062， 2014.
3	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： Semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848.
4	CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［DB/OL］. arXiv preprint： 1706.05587， 2017.
5	CORTINHAL T， TZELEPIS G， AKSOY E E. SalsaNext： Fast， uncertainty-aware semantic segmentation of LiDAR point clouds for autonomous driving［DB/OL］. arXiv preprint： 2003.03653， 2020.
6	AKSOY E E， BACI S， CAVDAR S. SalsaNet： Fast road and vehicle segmentation in LiDAR point clouds for autonomous driving［C］∥ 2020 IEEE Intelligent Vehicles Symposium （IV）. Piscataway： IEEE Press， 2021： 926-932.
7	VAN GANSBEKE W， NEVEN D， DE BRABANDERE B， et al. Sparse and noisy LiDAR completion with RGB guidance and uncertainty［C］∥ 2019 16th International Conference on Machine Vision Applications （MVA）. Piscataway： IEEE Press， 2019： 1-6.
8	MEYER G P， CHARLAND J， HEGDE D， et al. Sensor fusion for joint 3D object detection and semantic segmentation［C］∥ 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2020： 1230-1237.
9	CORTINHAL T， KURNAZ F， AKSOY E E. Semantics-aware multi-modal domain translation： From LiDAR point clouds to panoramic color images［C］∥ 2021 IEEE/CVF International Conference on Computer Vision Workshops （ICCVW）. Piscataway： IEEE Press， 2021： 3032-3041.
10	RUDER S. An overview of gradient descent optimization algorithms［DB/OL］. arXiv preprint： 1609.04747， 2016.
11	KINGMA D P， BA J. Adam： A method for stochastic optimization［J］. arXiv preprint： 1412.6980， 2014.
12	LUO L C， XIONG Y H， LIU Y， et al. Adaptive gradient methods with dynamic bound of learning rate［DB/OL］. arXiv preprint：1902.09843， 2019.
13	DING J B， REN X C， LUO R X， et al. An adaptive and momental bound method for stochastic learning［DB/OL］. arXiv preprint：1910.12249， 2019.
14	JADON S. A survey of loss functions for semantic segmentation［C］∥ 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology （CIBCB）. Piscataway： IEEE Press， 2020： 1-7.
15	XIE S N， TU Z W. Holistically-nested edge detection［C］∥ 2015 IEEE International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2016： 1395-1403.
16	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］∥ 2017 IEEE International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2017： 2999-3007.
17	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2016： 770-778.
18	BERMAN M， TRIKI A R， BLASCHKO M B. The lovasz-softmax loss： A tractable surrogate for the optimization of the intersection-over-union measure in neural networks［C］∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 4413-4421.
19	ISLAM M A， ROCHAN M， BRUCE N D B， et al. Gated feedback refinement network for dense image labeling［C］∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 4877-4885.
20	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： Optimal speed and accuracy of object detection［DB/OL］. arXiv preprint： 2004.10934， 2020.
21	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： Inverted residuals and linear bottlenecks［C］∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 4510-4520.
22	BEHLEY J， GARBADE M， MILIOTO A， et al. SemanticKITTI： A dataset for semantic scene understanding of LiDAR sequences［C］∥ 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2020： 9296-9306.
23	ALONSO I， RIAZUELO L， MONTESANO L， et al. 3D-MiniNet： Learning a 2D representation from point clouds for fast and efficient 3D LIDAR semantic segmentation［J］. IEEE Robotics and Automation Letters， 2020， 5（4）： 5432-5439.
24	WANG S， ZHU J K， ZHANG R X. Meta-RangeSeg： LiDAR sequence semantic segmentation using multiple feature aggregation［J］. IEEE Robotics and Automation Letters， 2022， 7（4）： 9739-9746.
25	MILIOTO A， VIZZO I， BEHLEY J， et al. RangeNet： Fast and accurate LiDAR semantic segmentation［C］∥ 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. Piscataway： IEEE Press， 2020： 4213-4220.
26	ZHAO Y H， WANG J， LI X L， et al. Number-adaptive prototype learning for 3D point cloud semantic segmentation［C］∥ European Conference on Computer Vision. Cham： Springer， 2023： 695-703.
27	XU C F， WU B C， WANG Z N， et al. SqueezeSegV3： Spatially-adaptive convolution for efficient point-cloud segmentation［C］∥ European Conference on Computer Vision. Cham： Springer， 2020： 1-19.
28	WANG J L， SUN B， LU Y. MVPNet： Multi-view point regression networks for 3D object reconstruction from A single image［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2019， 33（1）： 8949-8956.
29	KOCHANOV D， NEJADASL F K， BOOIJ O. KPRNet： Improving projection-based LiDAR semantic segmentation［DB/OL］. arXiv preprint： 2007.12668， 2020.
30	GENOVA K， YIN X Q， KUNDU A， et al. Learning 3D semantic segmentation with only 2D image supervision［C］∥ 2021 International Conference on 3D Vision （3DV）. Piscataway： IEEE Press， 2022： 361-372.

Options

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract

参考文献