在卫星定位信号失效的复杂环境下,无人机视觉地理定位容易因为场景纹理稀疏而匹配失败,同时卫星影像中的地理连续性与局部相似性也会导致仅依赖全局检索的Top-1结果产生歧义与漂移。为解决上述问题,本研究构建了一种融合物理先验约束的由粗到精视觉定位方法。首先,在粗检索阶段利用全局特征从卫星瓦片库中召回Top-K候选区域以覆盖所有可能位置;随后,在精匹配阶段设计了自适应假设验证流程,采用具有全局感受野的Transformer模型进行密集特征匹配,在无人机视图与卫星视图间建立鲁棒对应,从而缓解纹理不足引发的匹配退化。此外,提出了基于物理先验的异常匹配剔除算法,通过联合单应性变换的凸性约束与投影边界一致性检验,严格排除几何不可行的误匹配,并依据内点质量对候选瓦片重排序以确定最优解。在UAV-VisLoc数据集上的测试结果表明,相较于直接输出Top-1候选的基线方法,该方法显著提升了中小误差阈值下的定位精度,10米、30米和50米误差阈值下的定位准确率分别提升了11.44、11.33和10.22个百分点;在候选数量为5的设置下,端到端平均耗时为0.706秒。该方法有效缓解了稀疏特征的退化问题,在实现米级定位精度的同时,也兼顾了计算效率。
In complex environments where satellite positioning signals are unavailable, UAV visual geo-localization is prone to matching failures due to sparse scene textures. Meanwhile, the geographical continuity and local similarity inherent in satellite imagery can introduce ambiguity and localization drift when relying solely on the Top-1 result from global retrieval. To address these challenges, this study develops a coarse-to-fine visual localization method incorporating physical prior constraints. First, in the coarse retrieval stage, global features are used to retrieve the Top-K candidate regions from a satellite tile database, thereby covering all potential locations. Subsequently, in the fine matching stage, an adaptive hypothesis verification procedure is designed, in which a Transformer-based model with a global receptive field is employed for dense feature matching. This enables robust correspondences to be established between UAV and satellite views, thereby alleviating matching degradation caused by insufficient texture. Furthermore, a physical-prior-guided outlier rejection algorithm is proposed. By jointly enforcing the convexity constraint of the homography transformation and the projection-boundary consistency check, geometrically infeasible mismatches are strictly eliminated. Candidate tiles are then re-ranked according to inlier quality to determine the optimal solution. Experimental results on the UAV-VisLoc dataset demonstrate that, compared with the baseline method that directly outputs the Top-1 candidate, the proposed method significantly improves localization accuracy under small- and medium-error thresholds. Specifically, the localization accuracies at error thresholds of 10 m, 30 m, and 50 m are increased by 11.44, 11.33, and 10.22 percentage points, respectively. Under the setting of five candidate regions, the end-to-end average runtime is 0.706 s. Overall, the proposed method effectively mitigates the degradation caused by sparse features while achieving meter-level localization accuracy with favorable computational efficiency.