针对可见光和SAR图像由于成像机理不同导致图像内容差异大,深度特征难对齐,关联速度慢,提出一种深度多源哈希网络模型实现SAR图像和可见光图像间关联。首先,针对SAR与光学遥感图像颜色信息差异大,提出图像变换机制,将光学图像转换生成四种不同类型的光谱图像输入到网络中,打乱颜色通道,让网络更加关注于图像的纹理和轮廓信息,而对颜色信息不敏感;其次,针对SAR图像噪声大,同一场景下两种模态图像内容异构,提出图像对训练策略,减小多源图像间的特征差异;然后,针对关联效率低,存储消耗大,提出三元组哈希损失函数,提升模型的关联准确率,降低关联时间;最后,构建了一个SAR与光学双模态遥感数据集SODMRSID,填补了可见光与SAR多源遥感图像关联数据的空白,同时实验部分验证了数据集的实用性和提出算法的有效性。
Due to the different imaging mechanism of visible and SAR images, the content of images is different, the depth features are difficult to align and the correlation speed is slow. A depth cross modal hash network model is proposed to realize the cross modal correlation between SAR images and optical images. Firstly, aiming at the great difference of color information between SAR and optical remote sensing images, an image transformation mechanism is proposed. Four different types of spectral images are generated from optical image conversion and input into the network, which disrupts the color channel, so that the network pays more attention to the texture and contour information of the image, but is not sensitive to the color information; secondly, aiming at the high noise of SAR image, two modal images in the same scene are generated. For example, the content is heterogeneous, image pair training strategy is proposed to reduce the feature difference between cross-modal images; then, aiming at the low correlation efficiency and high storage consumption, a triple hash loss function is proposed to improve the association accuracy of the model and reduce the association time. Finally, a SAR and optical dual-mode remote sensing data set is constructed to make up for the lack of data in this field. The experimental part verifies the practicability of the data set and the effectiveness of the proposed algorithm.
[1] L. Wang, Y. Li, J. Huang and S. Lazebnik, "Learning Two-Branch Neural Networks for Image-Text Matching Tasks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 394-407, 1 Feb. 2019.
[2] M. Yang et al., "Multitask Learning for Cross-Domain Image Captioning," IEEE Transactions on Multimedia, vol. 21, no. 4, pp. 1047-1061, April 2019.
[3] X. Min, G. Zhai, J. Zhou, X. Zhang, X. Yang and X. Guan, "A Multimodal Saliency Model for Videos With High Audio-Visual Correspondence," IEEE Transactions on Image Processing, vol. 29, pp. 3805-3819, 2020.
[4] S. Parekh, S. Essid, A. Ozerov, N. Q. K. Duong, P. Pérez and G. Richard, "Weakly Supervised Representation Learning for Audio-Visual Scene Analysis," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 28, pp. 416-428, 2020.
[5] X. Xiang, N. Lv, Z. Yu, M. Zhai and A. El Saddik, "Cross-Modality Person Re-Identification Based on Dual-Path Multi-Branch Network," IEEE Sensors Journal, vol. 19, no. 23, pp. 11706-11713, 1 Dec.1, 2019.
[6] A. Wu, W. Zheng, H. Yu, S. Gong and J. Lai, "RGB-Infrared Cross-Modality Person Re-identification," IEEE International Conference on Computer Vision (ICCV), Venice, pp. 5390-5399, 2017.
[7] Y. Li, Y. Zhang, X. Huang et al., “Learning Source-Invariant Deep Hashing Convolutional Neural Networks for Cross-Source Remote Sensing Image Retrieval,” IEEE Transactions on Geoscience and Remote Sensing, vol. 56, no. 11, pp. 6521-6536, 2018.
[8] W. Xiong, Z. Xiong, Y. Cui and Y. Lv, "A Discriminative Distillation Network for Cross-Source Remote Sensing Image Retrieval," IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 1234-1247, 2020.
[9] W. Xiong, Y. Lv, X. Zhang and Y. Cui, "Learning to Translate for Cross-Source Remote Sensing Image Retrieval," IEEE Transactions on Geoscience and Remote Sensing, doi: 10.1109/TGRS.2020.2968096.
[10] Q. Y. Jiang, W. J. Li, “Deep Cross-Modal Hashing.” In Proc. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, pp. 3270-3278. 2017.
[11] M. Gou, Y. Yuan, X. Lu, “Deep Cross-Modal Retrieval for Remote Sensing Image and Audio.” In 2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing (PRRS). IEEE, pp. 1–7, 2018.
[12] X. Lu, B. Wang, X. Zheng, X. Li, “Exploring Models and Data for Remote Sensing Image Caption Generation.” IEEE Trans. Geosci. Remote Sens. vol. 2, no. 8, pp. 2183 - 2195, 2017.
[13] M. Schmitt, L. H. Hughes, and X. X. Zhu. “The sen1-2 dataset for deep learning in sar-optical data fusion.” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Seiten 141-146. ISPRS TCI Symposium 2018, 10- 12, Karlsruhe, Germany, Okt. 2018.
[14] R. Torres, P. Snoeij, D. Geudtner and D. Bibby. et al., “GMES Sentinel-1 mission.” Remote Sensing of Environment 120, pp. 9–24, 2012.
[15] M. Drusch, U. Del, S. Carlier and O. Colin et al., “Sentinel-2: ESA’s optical high-resolution mission for GMES operational services.” Remote sensing of Environment 120, pp. 25–36, 2012.