基于自适应记忆长度的多尺度模态融合网络

doi:10.7527/S1000-6893.2023.28977

Abstract

Abstract:

To better utilize the complementary advantages of perceptions of point clouds and optical images in the field of autonomous driving， a dual-modal fusion network called MerNet is proposed. The network adopts a parallel encoding structure of point cloud features and optical image features. In each encoding stage， an optical image feature fusion module based on residual mapping and dilated dot attention mechanism is used to unidirectionally fuse the optical image features into the point cloud feature branch. A cascade hollow convolution module based on multi-scale dilated branches is proposed to enhance the context connections of the point cloud， and a bottleneck structure is used for the parallel branches to reduce the parameter amount of the context module. To further optimize the parameter update process， an optimization algorithm based on adaptive variable historical memory length is proposed， which considers the contribution value of historical gradients in different gradient trends. A collaborative loss function based on cross-entropy loss is studied. By cross-comparing the predicted labels of different modes and setting thresholds to screen the predicted features of the modes， the perception advantages of different sensors are highlighted. MerNet is trained and validated on the public dataset SemanticKITTI. The experimental results show that the proposed dual-modal network can effectively improve the performance of semantic segmentation and make the algorithm pay more attention to highly dangerous dynamic objects in the driving environment. In addition， the proposed context module can also reduce the parameter amount by 64.89% and further improve the efficiency of the algorithm.

Key words: deep learning, semantic segmentation, multimodality, feature fusion, attention mechanism

CLC Number:

V219

Xiaohang LI, Jianjiang ZHOU. Multi⁃scale modality fusion network based on adaptive memory length[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 628977-628977.

Figures/Tables 14

Fig.1

Fig.2

Fig.3

Fig.4

Fig.5

Fig.6

Table 1

Comparison of parameter amount of dilated convolutional branch with bottleneck convolution

网络输入

$(C × H × W)$

网络结构

$(C i n, C o u t, k)$

参数量

(256 × 16 × 64)

C o n v (256,256,3)

590 080

$C o n v (256,64,1)$

$C o n v (64,64,3)$

$C o n v (64,256,1)$

180 570

Table 1

Table 2

Table 3

Fig.7

Fig.8

Table 4

Fig.9

Table 5

References 30

1	彭冬亮，文成林，薛安克. 多传感器多源信息融合理论及应用［M］. 北京：科学出版社， 2010.
	PENG D L， WEN C L， XUE A K. Theory and application of multi-sensor and multi-source information fusion［M］. Beijing： Science Press， 2010 （in Chinese）.
2	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs［DB/OL］. arXiv preprint： 1412.7062， 2014.
3	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： Semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848.
4	CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［DB/OL］. arXiv preprint： 1706.05587， 2017.
5	CORTINHAL T， TZELEPIS G， AKSOY E E. SalsaNext： Fast， uncertainty-aware semantic segmentation of LiDAR point clouds for autonomous driving［DB/OL］. arXiv preprint： 2003.03653， 2020.
6	AKSOY E E， BACI S， CAVDAR S. SalsaNet： Fast road and vehicle segmentation in LiDAR point clouds for autonomous driving［C］∥ 2020 IEEE Intelligent Vehicles Symposium （IV）. Piscataway： IEEE Press， 2021： 926-932.
7	VAN GANSBEKE W， NEVEN D， DE BRABANDERE B， et al. Sparse and noisy LiDAR completion with RGB guidance and uncertainty［C］∥ 2019 16th International Conference on Machine Vision Applications （MVA）. Piscataway： IEEE Press， 2019： 1-6.
8	MEYER G P， CHARLAND J， HEGDE D， et al. Sensor fusion for joint 3D object detection and semantic segmentation［C］∥ 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2020： 1230-1237.
9	CORTINHAL T， KURNAZ F， AKSOY E E. Semantics-aware multi-modal domain translation： From LiDAR point clouds to panoramic color images［C］∥ 2021 IEEE/CVF International Conference on Computer Vision Workshops （ICCVW）. Piscataway： IEEE Press， 2021： 3032-3041.
10	RUDER S. An overview of gradient descent optimization algorithms［DB/OL］. arXiv preprint： 1609.04747， 2016.
11	KINGMA D P， BA J. Adam： A method for stochastic optimization［J］. arXiv preprint： 1412.6980， 2014.
12	LUO L C， XIONG Y H， LIU Y， et al. Adaptive gradient methods with dynamic bound of learning rate［DB/OL］. arXiv preprint：1902.09843， 2019.
13	DING J B， REN X C， LUO R X， et al. An adaptive and momental bound method for stochastic learning［DB/OL］. arXiv preprint：1910.12249， 2019.
14	JADON S. A survey of loss functions for semantic segmentation［C］∥ 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology （CIBCB）. Piscataway： IEEE Press， 2020： 1-7.
15	XIE S N， TU Z W. Holistically-nested edge detection［C］∥ 2015 IEEE International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2016： 1395-1403.
16	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］∥ 2017 IEEE International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2017： 2999-3007.
17	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2016： 770-778.
18	BERMAN M， TRIKI A R， BLASCHKO M B. The lovasz-softmax loss： A tractable surrogate for the optimization of the intersection-over-union measure in neural networks［C］∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 4413-4421.
19	ISLAM M A， ROCHAN M， BRUCE N D B， et al. Gated feedback refinement network for dense image labeling［C］∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 4877-4885.
20	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： Optimal speed and accuracy of object detection［DB/OL］. arXiv preprint： 2004.10934， 2020.
21	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： Inverted residuals and linear bottlenecks［C］∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 4510-4520.
22	BEHLEY J， GARBADE M， MILIOTO A， et al. SemanticKITTI： A dataset for semantic scene understanding of LiDAR sequences［C］∥ 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2020： 9296-9306.
23	ALONSO I， RIAZUELO L， MONTESANO L， et al. 3D-MiniNet： Learning a 2D representation from point clouds for fast and efficient 3D LIDAR semantic segmentation［J］. IEEE Robotics and Automation Letters， 2020， 5（4）： 5432-5439.
24	WANG S， ZHU J K， ZHANG R X. Meta-RangeSeg： LiDAR sequence semantic segmentation using multiple feature aggregation［J］. IEEE Robotics and Automation Letters， 2022， 7（4）： 9739-9746.
25	MILIOTO A， VIZZO I， BEHLEY J， et al. RangeNet： Fast and accurate LiDAR semantic segmentation［C］∥ 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. Piscataway： IEEE Press， 2020： 4213-4220.
26	ZHAO Y H， WANG J， LI X L， et al. Number-adaptive prototype learning for 3D point cloud semantic segmentation［C］∥ European Conference on Computer Vision. Cham： Springer， 2023： 695-703.
27	XU C F， WU B C， WANG Z N， et al. SqueezeSegV3： Spatially-adaptive convolution for efficient point-cloud segmentation［C］∥ European Conference on Computer Vision. Cham： Springer， 2020： 1-19.
28	WANG J L， SUN B， LU Y. MVPNet： Multi-view point regression networks for 3D object reconstruction from A single image［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2019， 33（1）： 8949-8956.
29	KOCHANOV D， NEJADASL F K， BOOIJ O. KPRNet： Improving projection-based LiDAR semantic segmentation［DB/OL］. arXiv preprint： 2007.12668， 2020.
30	GENOVA K， YIN X Q， KUNDU A， et al. Learning 3D semantic segmentation with only 2D image supervision［C］∥ 2021 International Conference on 3D Vision （3DV）. Piscataway： IEEE Press， 2022： 361-372.

网络模型	参数量（parameters）	计算量（FLOPS）
ASPP	4.13 M	16.38 M
MsASPP	1.45 M	5.39 M

网络模型	car	bicycle	motorcycle	truck	other-vehicle	person	bicyclist	road	parking	sidewalk	other-ground	building	fence	vegetation	trunk	terrain	pole	traffic-sign	模态	mIoU /%
3D-MiniNet^［23］	90.5	42.3	42.1	28.5	29.4	47.8	44.1	91.6	64.2	74.5	25.4	89.4	60.8	82.8	60.8	66.7	48.0	56.6	L	55.8
Meta-RangeSeg^［24］	93.9	50.1	43.8	43.9	43.2	63.7	53.1	90.6	64.3	74.6	29.2	91.1	64.7	82.6	65.5	65.5	56.3	64.2	L	61.0
RangeNet53++^［25］	91.4	25.7	34.4	25.7	23.0	38.3	38.8	91.8	65.0	75.2	27.8	87.4	58.6	80.5	55.1	64.6	47.9	55.9	L	52.2
NAPL^［26］	96.6	32.3	43.6	47.3	47.5	51.1	53.9	89.6	67.1	73.7	31.2	91.9	67.4	84.8	69.8	68.8	59.1	59.2	L	61.6
SqueezesegV3^［27］	92.5	38.7	36.5	29.6	33.0	45.6	46.2	91.7	63.4	74.8	26.4	89.0	59.4	82.0	58.7	65.4	49.6	58.9	L	55.9
SalsaNext^［5］	91.9	48.3	38.6	38.9	31.9	60.2	59.0	91.7	63.7	75.8	29.1	90.2	64.2	81.8	63.6	66.5	54.3	62.1	L	59.5
MVP-Net^［28］	92.7	37.2	17.7	20.2	13.8	50.0	55.8	91.4	61.4	75.9	25.6	85.8	55.2	83.2	64.5	69.3	51.8	59.2	L	59.2
KPRNet^［29］	95.5	54.1	47.9	23.6	42.6	65.9	65.0	93.2	73.9	80.6	30.2	91.7	68.4	85.7	69.8	71.2	58.7	64.1	L+C	63.1
HiFANet^［30］	93.3	16.9	54.7			24.7	57.7	91.0		79.0		90.3	34.9	75.5		91.2	54.0	37.4	L+C	62.0
MerNet	95.2	41.0	60.5	72.7	76.9	75.0	80.3	96.4	46.8	80.6	0.7	87.9	61.1	87.1	69.9	72.9	63.0	42.8	L+C	63.7

基线	Adamwin	融合模块	MsASPP	DifferLoss	mIoU/%
√					60.50
√	√				62.81
√		√			61.48
√			√		62.97
√				√	62.89
√	√	√	√	√	63.73

简单融合	注意力	残差映射	膨胀注意力	mIoU/%
√				61.49
√	√			62.89
√	√	√		63.29
√	√	√	√	63.73

[1]	Pengyu LIU, Xueyao ZHU. Semantic parsing technology of air traffic control instruction in fusion airspace based on deep learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(S1): 727592-727592.
[2]	Zhiqiang FENG, Zhijun XIE, Zhengwei BAO, Kewei CHEN. Real⁃time dense small object detection algorithm for UAV based on improved YOLOv5 [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(7): 327106-327106.
[3]	Guotao MAO, Tianmin DENG, Nanjing YU. Object detection in UAV images based on multi-scale split attention [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(5): 326738-326738.
[4]	Baohui JIA, Fan JIANG, Yuxin WANG, Du WANG. Fault diagnosis method based on civil aircraft maintenance text data [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(5): 326598-326598.
[5]	Lei HE, Weiqi QIAN, Kangsheng DONG, Xian YI, Congcong CHAI. Aerodynamic characteristics modeling of iced airfoil based on convolution neural networks [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(5): 126434-126434.
[6]	Yi ZHANG, Yan ZHANG, Yu ZHANG, Yong ZHANG, Di LIU. Infrared aircraft target detection method based on multi-level feature enhancement fusion [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 629220-629220.
[7]	Yunhe ZHAO, Shengnan WANG. Solution to stress intensity factor by weight function method based on deep learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 228367-228367.
[8]	Haowen LUO, Shaoming HE, Tianyu JIN, Zichao LIU. Impact-angle-constrained with time-minimum guidance algorithm based on transfer learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(19): 328400-328400.
[9]	Shuyi GAO, Defu LIN, Duo ZHENG, Xinyu HU. Intelligent cooperative interception strategy of aircraft against cluster attack [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(18): 328301-328301.
[10]	Yubin YUAN, Yiquan WU, Langyue ZHAO, Jinlin CHEN, Qichang ZHAO. Research progress of UAV aerial video multi⁃object detection and tracking based on deep learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(18): 28334-028334.
[11]	Zihao LI, Zhengping WANG, Yuntao HE. Aerial-photography dense small target detection algorithm based on adaptive cooperative attention mechanism [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327944-327944.
[12]	Rongsheng ZHANG, Yansheng WU, Xudong QIN, Puzhuo ZHANG. A real⁃time in⁃flight wind estimation and prediction method based on deep learning [J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(13): 327860-327860.
[13]	Qiang WANG, Letian WU, Yong WANG, Huan WANG, Wankou YANG. An infrared small target detection method based on key point [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(10): 328173-328173.
[14]	Yuanliang XUE, Guodong JIN, Lining TAN, Jiankun XU. Adaptive UAV target tracking algorithm based on multi-scale fusion [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2023, 44(1): 326107-326107.
[15]	SU Lingfei, HUA Yongzhao, DONG Xiwang, REN Zhang. Human-UAV swarm multi-modal intelligent interaction methods [J]. ACTA AERONAUTICAET ASTRONAUTICA SINICA, 2022, 43(S1): 727001-727001.

Multi⁃scale modality fusion network based on adaptive memory length

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 14

References 30

Related Articles 15

Recommended Articles

Metrics

Comments