航空学报 > 2023, Vol. 44 Issue (22): 628977-628977   doi: 10.7527/S1000-6893.2023.28977

基于自适应记忆长度的多尺度模态融合网络

李晓航, 周建江()   

  1. 南京航空航天大学 雷达成像与微波光子技术教育部重点实验室,南京 211100
  • 收稿日期:2023-05-08 修回日期:2023-05-30 接受日期:2023-07-11 出版日期:2023-11-25 发布日期:2023-07-28
  • 通讯作者: 周建江 E-mail:zjjee@nuaa.edu.cn
  • 基金资助:
    国家自然科学基金(61501228)

Multi⁃scale modality fusion network based on adaptive memory length

Xiaohang LI, Jianjiang ZHOU()   

  1. Key Laboratory of Radar Imaging and Microwave Photonic Technology of Ministry of Education,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China
  • Received:2023-05-08 Revised:2023-05-30 Accepted:2023-07-11 Online:2023-11-25 Published:2023-07-28
  • Contact: Jianjiang ZHOU E-mail:zjjee@nuaa.edu.cn
  • Supported by:
    National Natural Science Foundation of China(61501228)

摘要:

为了更好地利用点云和光学图像在自动驾驶领域的互补感知优势,提出了一种双模态融合网络MerNet。网络采用点云特征和光学图像特征并行编码的结构,在每一个编码阶段通过基于残差映射和膨胀点注意力机制的融合模块将光学图像特征单向融合到点云特征支路。设计了一种多尺度膨胀支路的级联空洞卷积模块,以加强点云的上下文联系,并在并行支路上采用瓶颈结构来降低上下文模块的参数量。为进一步优化参数更新过程,提出了一种自适应变历史记忆长度的优化算法,考虑了不同梯度变化趋势下历史梯度的贡献值。研究了一种基于交叉熵损失的协同损失函数,通过交叉比对不同模态的预测标签,并通过设定阈值筛选对比模态的预测特征,突出不同传感器的感知优势。在公开数据集SemanticKITTI上对MerNet进行了训练和验证,实验结果表明:提出的双模态网络能够有效提升语义分割性能,并使算法更加关注驾驶环境中的高危险性动态目标。同时,提出的上下文模块还能够降低64.89%的参数量,进一步提高算法的效率。

关键词: 深度学习, 语义分割, 多模态, 特征融合, 注意力机制

Abstract:

To better utilize the complementary advantages of perceptions of point clouds and optical images in the field of autonomous driving, a dual-modal fusion network called MerNet is proposed. The network adopts a parallel encoding structure of point cloud features and optical image features. In each encoding stage, an optical image feature fusion module based on residual mapping and dilated dot attention mechanism is used to unidirectionally fuse the optical image features into the point cloud feature branch. A cascade hollow convolution module based on multi-scale dilated branches is proposed to enhance the context connections of the point cloud, and a bottleneck structure is used for the parallel branches to reduce the parameter amount of the context module. To further optimize the parameter update process, an optimization algorithm based on adaptive variable historical memory length is proposed, which considers the contribution value of historical gradients in different gradient trends. A collaborative loss function based on cross-entropy loss is studied. By cross-comparing the predicted labels of different modes and setting thresholds to screen the predicted features of the modes, the perception advantages of different sensors are highlighted. MerNet is trained and validated on the public dataset SemanticKITTI. The experimental results show that the proposed dual-modal network can effectively improve the performance of semantic segmentation and make the algorithm pay more attention to highly dangerous dynamic objects in the driving environment. In addition, the proposed context module can also reduce the parameter amount by 64.89% and further improve the efficiency of the algorithm.

Key words: deep learning, semantic segmentation, multimodality, feature fusion, attention mechanism

中图分类号: