导航

Acta Aeronautica et Astronautica Sinica ›› 2023, Vol. 44 ›› Issue (22): 628977-628977.doi: 10.7527/S1000-6893.2023.28977

• special column • Previous Articles     Next Articles

Multi⁃scale modality fusion network based on adaptive memory length

Xiaohang LI, Jianjiang ZHOU()   

  1. Key Laboratory of Radar Imaging and Microwave Photonic Technology of Ministry of Education,Nanjing University of Aeronautics and Astronautics,Nanjing 211100,China
  • Received:2023-05-08 Revised:2023-05-30 Accepted:2023-07-11 Online:2023-10-17 Published:2023-07-28
  • Contact: Jianjiang ZHOU E-mail:zjjee@nuaa.edu.cn
  • Supported by:
    National Natural Science Foundation of China(61501228)

Abstract:

To better utilize the complementary advantages of perceptions of point clouds and optical images in the field of autonomous driving, a dual-modal fusion network called MerNet is proposed. The network adopts a parallel encoding structure of point cloud features and optical image features. In each encoding stage, an optical image feature fusion module based on residual mapping and dilated dot attention mechanism is used to unidirectionally fuse the optical image features into the point cloud feature branch. A cascade hollow convolution module based on multi-scale dilated branches is proposed to enhance the context connections of the point cloud, and a bottleneck structure is used for the parallel branches to reduce the parameter amount of the context module. To further optimize the parameter update process, an optimization algorithm based on adaptive variable historical memory length is proposed, which considers the contribution value of historical gradients in different gradient trends. A collaborative loss function based on cross-entropy loss is studied. By cross-comparing the predicted labels of different modes and setting thresholds to screen the predicted features of the modes, the perception advantages of different sensors are highlighted. MerNet is trained and validated on the public dataset SemanticKITTI. The experimental results show that the proposed dual-modal network can effectively improve the performance of semantic segmentation and make the algorithm pay more attention to highly dangerous dynamic objects in the driving environment. In addition, the proposed context module can also reduce the parameter amount by 64.89% and further improve the efficiency of the algorithm.

Key words: deep learning, semantic segmentation, multimodality, feature fusion, attention mechanism

CLC Number: