ICT-Net: A Framework for Multi-Domain Cross-View Geo-Localization with Multi-Source Remote Sensing Fusion

Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptat...

Full description

Saved in:
Bibliographic Details
Main Authors: Min Wu, Sirui Xu, Ziwei Wang, Jin Dong, Gong Cheng, Xinlong Yu, Yang Liu
Format: Article
Language:English
Published: MDPI AG 2025-06-01
Series:Remote Sensing
Subjects:
Online Access:https://www.mdpi.com/2072-4292/17/12/1988
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net (Integrated CNN-Transformer Network) that synergistically combines convolutional neural networks with Transformer architectures. Our approach harnesses the complementary strengths of CNNs in capturing local geometric details and Transformers in establishing long-range dependencies, enabling comprehensive joint perception of both local and global visual patterns. Furthermore, capitalizing on the Transformer’s flexible input processing mechanism, we develop an attention-guided non-uniform cropping strategy that dynamically eliminates redundant image patches with minimal impact on localization accuracy, thereby achieving enhanced computational efficiency. To facilitate practical deployment, we propose a deep embedding clustering algorithm optimized for rapid parsing of geo-localization information. Extensive experiments demonstrate that ICT-Net establishes new state-of-the-art localization accuracy on the CVUSA benchmark, achieving a top-1 recall rate improvement of 8.6% over previous methods. Additional validation on a challenging real-world dataset collected at Beihang University (BUAA) further confirms the framework’s effectiveness and practical applicability in complex urban environments, particularly showing 23% higher robustness to vegetation variations.
ISSN:2072-4292