基于自适应记忆长度的多尺度模态融合网络

doi:10.7527/S1000-6893.2023.28977

专栏

本期目录 | 过刊浏览 | 高级检索

前一篇 | 后一篇

基于自适应记忆长度的多尺度模态融合网络

李晓航, 周建江()

南京航空航天大学雷达成像与微波光子技术教育部重点实验室，南京 211100

收稿日期:2023-05-08 修回日期:2023-05-30 接受日期:2023-07-11 出版日期:2023-10-17 发布日期:2023-07-28
通讯作者: 周建江 E-mail:zjjee@nuaa.edu.cn
基金资助:
国家自然科学基金(61501228)

Multi⁃scale modality fusion network based on adaptive memory length

Xiaohang LI, Jianjiang ZHOU()

Key Laboratory of Radar Imaging and Microwave Photonic Technology of Ministry of Education，Nanjing University of Aeronautics and Astronautics，Nanjing 211100，China

Received:2023-05-08 Revised:2023-05-30 Accepted:2023-07-11 Online:2023-10-17 Published:2023-07-28
Contact: Jianjiang ZHOU E-mail:zjjee@nuaa.edu.cn
Supported by:
National Natural Science Foundation of China(61501228)

摘要/Abstract

摘要：

为了更好地利用点云和光学图像在自动驾驶领域的互补感知优势，提出了一种双模态融合网络MerNet。网络采用点云特征和光学图像特征并行编码的结构，在每一个编码阶段通过基于残差映射和膨胀点注意力机制的融合模块将光学图像特征单向融合到点云特征支路。设计了一种多尺度膨胀支路的级联空洞卷积模块，以加强点云的上下文联系，并在并行支路上采用瓶颈结构来降低上下文模块的参数量。为进一步优化参数更新过程，提出了一种自适应变历史记忆长度的优化算法，考虑了不同梯度变化趋势下历史梯度的贡献值。研究了一种基于交叉熵损失的协同损失函数，通过交叉比对不同模态的预测标签，并通过设定阈值筛选对比模态的预测特征，突出不同传感器的感知优势。在公开数据集SemanticKITTI上对MerNet进行了训练和验证，实验结果表明：提出的双模态网络能够有效提升语义分割性能，并使算法更加关注驾驶环境中的高危险性动态目标。同时，提出的上下文模块还能够降低64.89%的参数量，进一步提高算法的效率。

关键词: 深度学习, 语义分割, 多模态, 特征融合, 注意力机制

Abstract:

To better utilize the complementary advantages of perceptions of point clouds and optical images in the field of autonomous driving， a dual-modal fusion network called MerNet is proposed. The network adopts a parallel encoding structure of point cloud features and optical image features. In each encoding stage， an optical image feature fusion module based on residual mapping and dilated dot attention mechanism is used to unidirectionally fuse the optical image features into the point cloud feature branch. A cascade hollow convolution module based on multi-scale dilated branches is proposed to enhance the context connections of the point cloud， and a bottleneck structure is used for the parallel branches to reduce the parameter amount of the context module. To further optimize the parameter update process， an optimization algorithm based on adaptive variable historical memory length is proposed， which considers the contribution value of historical gradients in different gradient trends. A collaborative loss function based on cross-entropy loss is studied. By cross-comparing the predicted labels of different modes and setting thresholds to screen the predicted features of the modes， the perception advantages of different sensors are highlighted. MerNet is trained and validated on the public dataset SemanticKITTI. The experimental results show that the proposed dual-modal network can effectively improve the performance of semantic segmentation and make the algorithm pay more attention to highly dangerous dynamic objects in the driving environment. In addition， the proposed context module can also reduce the parameter amount by 64.89% and further improve the efficiency of the algorithm.

Key words: deep learning, semantic segmentation, multimodality, feature fusion, attention mechanism

中图分类号:

V219

李晓航, 周建江. 基于自适应记忆长度的多尺度模态融合网络[J]. 航空学报, 2023, 44(22): 628977-628977.

Xiaohang LI, Jianjiang ZHOU. Multi⁃scale modality fusion network based on adaptive memory length[J]. Acta Aeronautica et Astronautica Sinica, 2023, 44(22): 628977-628977.

图/表 14

图 1

图 2

图 3

图 4

图 5

图 6

表 1

膨胀支路使用瓶颈卷积的参数量比较

网络输入

$(C × H × W)$

网络结构

$(C i n, C o u t, k)$

参数量

(256 × 16 × 64)

C o n v (256,256,3)

590 080

$C o n v (256,64,1)$

$C o n v (64,64,3)$

$C o n v (64,256,1)$

180 570

表 1

表 2

表 3

图 7

图 8

表 4

图 9

表 5

参考文献 30

1	彭冬亮，文成林，薛安克. 多传感器多源信息融合理论及应用［M］. 北京：科学出版社， 2010.
	PENG D L， WEN C L， XUE A K. Theory and application of multi-sensor and multi-source information fusion［M］. Beijing： Science Press， 2010 （in Chinese）.
2	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs［DB/OL］. arXiv preprint： 1412.7062， 2014.
3	CHEN L C， PAPANDREOU G， KOKKINOS I， et al. DeepLab： Semantic image segmentation with deep convolutional nets， atrous convolution， and fully connected CRFs［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2018， 40（4）： 834-848.
4	CHEN L C， PAPANDREOU G， SCHROFF F， et al. Rethinking atrous convolution for semantic image segmentation［DB/OL］. arXiv preprint： 1706.05587， 2017.
5	CORTINHAL T， TZELEPIS G， AKSOY E E. SalsaNext： Fast， uncertainty-aware semantic segmentation of LiDAR point clouds for autonomous driving［DB/OL］. arXiv preprint： 2003.03653， 2020.
6	AKSOY E E， BACI S， CAVDAR S. SalsaNet： Fast road and vehicle segmentation in LiDAR point clouds for autonomous driving［C］∥ 2020 IEEE Intelligent Vehicles Symposium （IV）. Piscataway： IEEE Press， 2021： 926-932.
7	VAN GANSBEKE W， NEVEN D， DE BRABANDERE B， et al. Sparse and noisy LiDAR completion with RGB guidance and uncertainty［C］∥ 2019 16th International Conference on Machine Vision Applications （MVA）. Piscataway： IEEE Press， 2019： 1-6.
8	MEYER G P， CHARLAND J， HEGDE D， et al. Sensor fusion for joint 3D object detection and semantic segmentation［C］∥ 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops （CVPRW）. Piscataway： IEEE Press， 2020： 1230-1237.
9	CORTINHAL T， KURNAZ F， AKSOY E E. Semantics-aware multi-modal domain translation： From LiDAR point clouds to panoramic color images［C］∥ 2021 IEEE/CVF International Conference on Computer Vision Workshops （ICCVW）. Piscataway： IEEE Press， 2021： 3032-3041.
10	RUDER S. An overview of gradient descent optimization algorithms［DB/OL］. arXiv preprint： 1609.04747， 2016.
11	KINGMA D P， BA J. Adam： A method for stochastic optimization［J］. arXiv preprint： 1412.6980， 2014.
12	LUO L C， XIONG Y H， LIU Y， et al. Adaptive gradient methods with dynamic bound of learning rate［DB/OL］. arXiv preprint：1902.09843， 2019.
13	DING J B， REN X C， LUO R X， et al. An adaptive and momental bound method for stochastic learning［DB/OL］. arXiv preprint：1910.12249， 2019.
14	JADON S. A survey of loss functions for semantic segmentation［C］∥ 2020 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology （CIBCB）. Piscataway： IEEE Press， 2020： 1-7.
15	XIE S N， TU Z W. Holistically-nested edge detection［C］∥ 2015 IEEE International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2016： 1395-1403.
16	LIN T Y， GOYAL P， GIRSHICK R， et al. Focal loss for dense object detection［C］∥ 2017 IEEE International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2017： 2999-3007.
17	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition［C］∥ 2016 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2016： 770-778.
18	BERMAN M， TRIKI A R， BLASCHKO M B. The lovasz-softmax loss： A tractable surrogate for the optimization of the intersection-over-union measure in neural networks［C］∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 4413-4421.
19	ISLAM M A， ROCHAN M， BRUCE N D B， et al. Gated feedback refinement network for dense image labeling［C］∥ 2017 IEEE Conference on Computer Vision and Pattern Recognition （CVPR）. Piscataway： IEEE Press， 2017： 4877-4885.
20	BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4： Optimal speed and accuracy of object detection［DB/OL］. arXiv preprint： 2004.10934， 2020.
21	SANDLER M， HOWARD A， ZHU M L， et al. MobileNetV2： Inverted residuals and linear bottlenecks［C］∥ 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE Press， 2018： 4510-4520.
22	BEHLEY J， GARBADE M， MILIOTO A， et al. SemanticKITTI： A dataset for semantic scene understanding of LiDAR sequences［C］∥ 2019 IEEE/CVF International Conference on Computer Vision （ICCV）. Piscataway： IEEE Press， 2020： 9296-9306.
23	ALONSO I， RIAZUELO L， MONTESANO L， et al. 3D-MiniNet： Learning a 2D representation from point clouds for fast and efficient 3D LIDAR semantic segmentation［J］. IEEE Robotics and Automation Letters， 2020， 5（4）： 5432-5439.
24	WANG S， ZHU J K， ZHANG R X. Meta-RangeSeg： LiDAR sequence semantic segmentation using multiple feature aggregation［J］. IEEE Robotics and Automation Letters， 2022， 7（4）： 9739-9746.
25	MILIOTO A， VIZZO I， BEHLEY J， et al. RangeNet： Fast and accurate LiDAR semantic segmentation［C］∥ 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems （IROS）. Piscataway： IEEE Press， 2020： 4213-4220.
26	ZHAO Y H， WANG J， LI X L， et al. Number-adaptive prototype learning for 3D point cloud semantic segmentation［C］∥ European Conference on Computer Vision. Cham： Springer， 2023： 695-703.
27	XU C F， WU B C， WANG Z N， et al. SqueezeSegV3： Spatially-adaptive convolution for efficient point-cloud segmentation［C］∥ European Conference on Computer Vision. Cham： Springer， 2020： 1-19.
28	WANG J L， SUN B， LU Y. MVPNet： Multi-view point regression networks for 3D object reconstruction from A single image［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2019， 33（1）： 8949-8956.
29	KOCHANOV D， NEJADASL F K， BOOIJ O. KPRNet： Improving projection-based LiDAR semantic segmentation［DB/OL］. arXiv preprint： 2007.12668， 2020.
30	GENOVA K， YIN X Q， KUNDU A， et al. Learning 3D semantic segmentation with only 2D image supervision［C］∥ 2021 International Conference on 3D Vision （3DV）. Piscataway： IEEE Press， 2022： 361-372.

E-mail：hkxb@buaa.edu.cn

关于我们

期刊社服务

专业学科

封面文章

友情链接

主管单位：中国科学技术协会主办单位：中国航空学会北京航空航天大学

网络模型	参数量（parameters）	计算量（FLOPS）
ASPP	4.13 M	16.38 M
MsASPP	1.45 M	5.39 M

网络模型	car	bicycle	motorcycle	truck	other-vehicle	person	bicyclist	road	parking	sidewalk	other-ground	building	fence	vegetation	trunk	terrain	pole	traffic-sign	模态	mIoU /%
3D-MiniNet^［23］	90.5	42.3	42.1	28.5	29.4	47.8	44.1	91.6	64.2	74.5	25.4	89.4	60.8	82.8	60.8	66.7	48.0	56.6	L	55.8
Meta-RangeSeg^［24］	93.9	50.1	43.8	43.9	43.2	63.7	53.1	90.6	64.3	74.6	29.2	91.1	64.7	82.6	65.5	65.5	56.3	64.2	L	61.0
RangeNet53++^［25］	91.4	25.7	34.4	25.7	23.0	38.3	38.8	91.8	65.0	75.2	27.8	87.4	58.6	80.5	55.1	64.6	47.9	55.9	L	52.2
NAPL^［26］	96.6	32.3	43.6	47.3	47.5	51.1	53.9	89.6	67.1	73.7	31.2	91.9	67.4	84.8	69.8	68.8	59.1	59.2	L	61.6
SqueezesegV3^［27］	92.5	38.7	36.5	29.6	33.0	45.6	46.2	91.7	63.4	74.8	26.4	89.0	59.4	82.0	58.7	65.4	49.6	58.9	L	55.9
SalsaNext^［5］	91.9	48.3	38.6	38.9	31.9	60.2	59.0	91.7	63.7	75.8	29.1	90.2	64.2	81.8	63.6	66.5	54.3	62.1	L	59.5
MVP-Net^［28］	92.7	37.2	17.7	20.2	13.8	50.0	55.8	91.4	61.4	75.9	25.6	85.8	55.2	83.2	64.5	69.3	51.8	59.2	L	59.2
KPRNet^［29］	95.5	54.1	47.9	23.6	42.6	65.9	65.0	93.2	73.9	80.6	30.2	91.7	68.4	85.7	69.8	71.2	58.7	64.1	L+C	63.1
HiFANet^［30］	93.3	16.9	54.7			24.7	57.7	91.0		79.0		90.3	34.9	75.5		91.2	54.0	37.4	L+C	62.0
MerNet	95.2	41.0	60.5	72.7	76.9	75.0	80.3	96.4	46.8	80.6	0.7	87.9	61.1	87.1	69.9	72.9	63.0	42.8	L+C	63.7

基线	Adamwin	融合模块	MsASPP	DifferLoss	mIoU/%
√					60.50
√	√				62.81
√		√			61.48
√			√		62.97
√				√	62.89
√	√	√	√	√	63.73

简单融合	注意力	残差映射	膨胀注意力	mIoU/%
√				61.49
√	√			62.89
√	√	√		63.29
√	√	√	√	63.73

[1]	刘鹏宇, 朱雪耀. 基于深度学习的融合空域空管指令语义解析技术[J]. 航空学报, 2023, 44(S1): 727592-727592.
[2]	奉志强, 谢志军, 包正伟, 陈科伟. 基于改进YOLOv5的无人机实时密集小目标检测算法[J]. 航空学报, 2023, 44(7): 327106-327106.
[3]	冒国韬, 邓天民, 于楠晶. 基于多尺度分割注意力的无人机航拍图像目标检测算法[J]. 航空学报, 2023, 44(5): 326738-326738.
[4]	贾宝惠, 姜番, 王玉鑫, 王杜. 基于民机维修文本数据的故障诊断方法[J]. 航空学报, 2023, 44(5): 326598-326598.
[5]	何磊, 钱炜祺, 董康生, 易贤, 柴聪聪. 基于卷积神经网络的结冰翼型气动特性建模[J]. 航空学报, 2023, 44(5): 126434-126434.
[6]	张毅, 张焱, 张宇, 张勇, 刘荻. 基于多级特征增强融合的红外飞机目标检测方法[J]. 航空学报, 2023, 44(22): 629220-629220.
[7]	赵鋆赫, 王生楠. 基于深度学习的权函数法应力强度因子求解[J]. 航空学报, 2023, 44(19): 228367-228367.
[8]	罗皓文, 何绍溟, 金天宇, 刘子超. 基于迁移学习的角度约束时间最短制导算法[J]. 航空学报, 2023, 44(19): 328400-328400.
[9]	李敏, 袁利, 魏春岭. 基于混合状态机的航天器自主绕飞多模态控制[J]. 航空学报, 2023, 44(18): 328296-328296.
[10]	高树一, 林德福, 郑多, 胡馨予. 针对集群攻击的飞行器智能协同拦截策略[J]. 航空学报, 2023, 44(18): 328301-328301.
[11]	苑玉彬, 吴一全, 赵朗月, 陈金林, 赵其昌. 基于深度学习的无人机航拍视频多目标检测与跟踪研究进展[J]. 航空学报, 2023, 44(18): 28334-028334.
[12]	李子豪, 王正平, 贺云涛. 基于自适应协同注意力机制的航拍密集小目标检测算法[J]. 航空学报, 2023, 44(13): 327944-327944.
[13]	张荣升, 吴燕生, 秦旭东, 张普卓. 基于深度学习的高空风在线估计及预报方法[J]. 航空学报, 2023, 44(13): 327860-327860.
[14]	王强, 吴乐天, 王勇, 王欢, 杨万扣. 基于关键点检测的红外弱小目标检测[J]. 航空学报, 2023, 44(10): 328173-328173.
[15]	薛远亮, 金国栋, 谭力宁, 许剑锟. 基于多尺度融合的自适应无人机目标跟踪算法[J]. 航空学报, 2023, 44(1): 326107-326107.

基于自适应记忆长度的多尺度模态融合网络

Multi⁃scale modality fusion network based on adaptive memory length

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 14

参考文献 30

相关文章 15

编辑推荐

Metrics

本文评价