在空天一体化海洋监测与智能海事管理中,利用无人机与卫星等空天平台采集的多源传感信息实现船舶目标识别,对航道管控、海上执法及边防预警具有重要意义。然而,真实场景下的船舶识别任务通常面临两大挑战:一是多模态数据的融合困难,可见光、辐射源信号不同模态之间存在异质性和时空不对齐问题;二是船舶类别天然呈现严重的长尾分布,头部类别占据大量样本,而尾部类别数据稀缺,严重影响整体识别性能。针对上述问题,本文提出一种面向空天多模态感知的长尾分布船舶识别方法。该方法融合类别感知边界优化策略、基于类别的重加权策略,有效提升尾部类别的判别能力与多模态融合的鲁棒性。实验结果表明,所提方法在典型长尾分布船舶识别任务中均取得了优于现有方法的性能,展现出良好的实用性与泛化能力。
In the context of integrated aerial–space ocean monitoring and intelligent maritime management, ship target recognition based on multisource sensing data collected from unmanned aerial vehicles (UAVs) and satellites plays a crucial role in navigation control, maritime law enforcement, and border surveillance. However, real-world ship recognition tasks face two major challenges. First, multimodal data fusion is difficult due to the heterogeneity and spatiotemporal misalignment between different modalities, such as optical images and electromagnetic radiation signals. Second, ship categories naturally exhibit a severe long-tailed distribution, where head classes dominate the sample population while tail classes remain scarce, significantly degrading overall recognition performance. To address these challenges, this paper proposes a long-tailed ship recognition method oriented toward aerial–space multimodal per-ception. The proposed method integrates a class-aware boundary optimization strategy and a category-based reweighting mechanism, effectively enhancing the discriminative capability of tail classes and improving the robustness of multimodal fusion. Experimental results demonstrate that the proposed method consistently outperforms existing approaches on representative long-tailed ship recog-nition tasks, showing strong practicality and generalization capability.
[1] Guo L, Wang Y, Liu Y, et al. Ultralight convolutional neural network for automatic modulation classification in internet of unmanned aerial vehicles[J]. IEEE Internet of Things Journal, 2024, 11(11): 20831-20839.
[2] Liang P P, Ling C K, Cheng Y, et al. Multimodal Learning Without Labeled Multimodal Data: Guarantees and Applications[C]//ICLR. 2024.
[3] 欧阳昱中,韩锐,刘驰.边缘侧领域自适应中长尾视觉识别技术研究[J/OL].计算机工程,1-10[2025-05-20]. https://doi.org/10.19678/j.issn.1000-3428.0069287.
[4] He X, Wang Y, Zhao S, et al. Co-attention fusion network for multimodal skin cancer diagnosis[J]. Pattern Recognition, 2023, 133: 108990.
[5] Lu Y, Zhao W, Sun N, et al. Enhancing multimodal knowledge graph representation learning through triple contrastive learning[C]//Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 2024: 5963-5971.
[6] Zhang X, Demiris Y. Visible and infrared image fusion using deep learning[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(8): 10535-10554.
[7] Tu Y, Lin Y, Hou C, et al. Complex-valued networks for automatic modulation classification[J]. IEEE Transactions on Vehicular Technology, 2020, 69(9): 10085-10089.
[8] Guo L, Liu C, Liu Y, et al. Toward open-set specific emitter identification using auxiliary classifier generative adversarial network and OpenMax[J]. IEEE Transactions on Cognitive Communications and Networking, 2024, 10(6): 2019-2028.
[9] Zhang Y, Latham P E, Saxe A. Understanding unimodal bias in multimodal deep linear networks[J]. arXiv preprint arXiv:2312.00935, 2023.
[10] Wang J, Xu C, Zhao C, et al. Multimodal object detection of UAV remote sensing based on joint representation optimization and specific information enhancement[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2024, 17: 12364-12373.
[11] Huang C, Cai W, Jiang Q, et al. Multimodal representation distribution learning for medical image segmentation[C]//Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 2024: 4156-4164.
[12] 郭浩, 李欣奕, 唐九阳, 等. 自适应特征融合的多模态实体对齐研究[J]. 自动化学报, 2024, 50(4): 758-770.
[13] Ma H, He D, Wang X, et al. Multi-modal sarcasm detection based on dual generative processes[C]//Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence. 2024: 2279-2287.
[14] Zhang X, Yoon J, Bansal M, et al. Multimodal representation learning by alternating unimodal adaptation[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 27456-27466.
[15] 韩佳艺, 刘建伟, 陈德华, 等. 深度长尾学习研究综述[J]. 自动化学报, 2025, 51(5): 1-36.
[16] Choo Y H, Cai Z, Le V, et al. Multi-objective flexible job-shop scheduling with an ensemble optimisation model[C]//2022 IEEE Industrial Electronics and Applications Conference (IEACon). IEEE, 2022: 229-234.
[17] Zhang S, Li Z, Yan S, et al. Distribution alignment: A unified framework for long-tail visual recognition[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 2361-2370.
[18] Wang Q, Qu X, Jin P, et al. ODinMJ: A red, green, blue-thermal dataset for mountain jungle object detection[J]. IEEE Geoscience and Remote Sensing Magazine, 2024.
[19] 魏秀参, 许玉燕, 杨健. 网络监督数据下的细粒度图像识别综述[J]. 中国图象图形学报, 2022, 27(7): 2057-2077.
[20] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.
[21] Cao K, Wei C, Gaidon A, et al. Learning imbalanced datasets with label-distribution-aware margin loss[J]. Advances in neural information processing systems, 2019, 32.
[22] Wang Q, Yin C, Song H, et al. UTFNet: Uncertainty-guided trustworthy fusion network for RGB-thermal semantic segmentation[J]. IEEE Geoscience and Remote Sensing Letters, 2023, 20: 1-5.