导航

Acta Aeronautica et Astronautica Sinica

Previous Articles     Next Articles

Hyperspectral-LiDAR Joint Classification Method Based on Vision-Language Pre-trained Models

  

  • Received:2025-07-28 Revised:2025-10-14 Online:2025-10-24 Published:2025-10-24
  • Supported by:
    ;supported by “the Fundamental Research Funds for the Central Universities”

Abstract: To address the challenge of inaccurate land-cover classification caused by differences in spatial resolution, data heterogeneity, and limited labeled samples in multimodal remote sensing data, this study investigates the joint clas-sification of hyperspectral imagery (HSI) and LiDAR data. We propose a Semantic-Aware Cross-Modal Fusion Net-work (SCF-Net). First, a lightweight patch encoder transforms the input data into RGB-compatible feature maps, which are then fed into a CLIP-based visual encoder enhanced with learnable prompts. To efficiently integrate mul-timodal information, an adaptive cross-modal fusion architecture is employed, featuring grouped linear projection and a relation-aware interaction module that enables dynamic spatial feature exchange at low computational cost. For semantic discrimination, attribute-category textual prompts are generated, and classification is performed by computing the cosine similarity between visual and textual embeddings, followed by a Top-K attribute averaging strategy. Experiments on the Houston 2013, MUUFL, and Trento datasets demonstrate that SCF-Net outperforms eight state-of-the-art fusion methods, achieving improvements of over 2.88% in overall accuracy, 2.69% in average accuracy, and 3.02% in Kappa coefficient, while maintaining high parameter efficiency. Ablation studies further vali-date the effectiveness of each component. This network offers a novel paradigm for integrating multimodal remote sensing data with large-scale vision-language pre-trained models in complex classification tasks.

Key words: Multi-modal Remote Sensing Data, Land Cover Classification, Semantic-Aware Cross-Modal Fusion Network, Adaptive Cross-Modal Fusion

CLC Number: