Global-Frequency-Domain Network: A Semantic Segmentation Method for High-Resolution Remote Sensing Images Based on Fine-Grained Feature Extraction and Global Context Integration

The accurate semantic segmentation of high-resolution remote sensing images is essential for urban planning and management applications. The inherent complex spatial structure and abundant contextual information in these images make segmentation challenges, such as feature recognition difficulties a...

Full description

Saved in:
Bibliographic Details
Main Authors: Ye Zhou, Mingyue Zhang, Yechenzi Wang
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10975104/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The accurate semantic segmentation of high-resolution remote sensing images is essential for urban planning and management applications. The inherent complex spatial structure and abundant contextual information in these images make segmentation challenges, such as feature recognition difficulties and segmentation discontinuities. Hence, we propose a novel global-frequency-domain network (GFDNet) designed for the semantic segmentation of high-resolution remote sensing images. The GFDNet framework incorporates a global-frequency-domain feature module that employs a learnable global filter to extract contextual information, leveraging the Fourier transform to capture both the spatial- and frequency-domain features. In addition, a hybrid attention mechanism combining channel sparse and spatial attention is introduced to enhance fine-grained feature extraction when objects are continuous. Then, a spatial feature-aware decoder module enhances the ability of the network to perceive surrounding structural targets, thereby improving the spatial feature representation. A series of experimental results demonstrate that the GFDNet network outperforms traditional convolutional neural network and transformer-based methods, achieving the highest mean intersection over union scores of 77.12% and 81.08% on the Vaihingen and Potsdam datasets, respectively. Comparative analyses further indicate GFDNet’s superior accuracy and generalization capabilities in delivering precise semantic segmentation outcomes.
ISSN:1939-1404
2151-1535