NGSTGAN: N-Gram Swin Transformer and Multi-Attention U-Net Discriminator for Efficient Multi-Spectral Remote Sensing Image Super-Resolution
The reconstruction of high-resolution (HR) remote sensing images (RSIs) from low-resolution (LR) counterparts is a critical task in remote sensing image super-resolution (RSISR). Recent advancements in convolutional neural networks (CNNs) and Transformers have significantly improved RSISR performanc...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Remote Sensing |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-4292/17/12/2079 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The reconstruction of high-resolution (HR) remote sensing images (RSIs) from low-resolution (LR) counterparts is a critical task in remote sensing image super-resolution (RSISR). Recent advancements in convolutional neural networks (CNNs) and Transformers have significantly improved RSISR performance due to their capabilities in local feature extraction and global modeling. However, several limitations remain, including the underutilization of multi-scale features in RSIs, the limited receptive field of Swin Transformer’s window self-attention (WSA), and the computational complexity of existing methods. To address these issues, this paper introduces the NGSTGAN model, which employs an N-Gram Swin Transformer as the generator and a multi-attention U-Net as the discriminator. The discriminator enhances attention to multi-scale key features through the addition of channel, spatial, and pixel attention (CSPA) modules, while the generator utilizes an improved shallow feature extraction (ISFE) module to extract multi-scale and multi-directional features, enhancing the capture of complex textures and details. The N-Gram concept is introduced to expand the receptive field of Swin Transformer, and sliding window self-attention (S-WSA) is employed to facilitate interaction between neighboring windows. Additionally, channel-reducing group convolution (CRGC) is used to reduce the number of parameters and computational complexity. A cross-sensor multispectral dataset combining Landsat-8 (L8) and Sentinel-2 (S2) is constructed for the resolution enhancement of L8’s blue (B), green (G), red (R), and near-infrared (NIR) bands from 30 m to 10 m. Experiments show that NGSTGAN outperforms the state-of-the-art (SOTA) method, achieving improvements of 0.5180 dB in the peak signal-to-noise ratio (PSNR) and 0.0153 in the structural similarity index measure (SSIM) over the second best method, offering a more effective solution to the task. |
---|---|
ISSN: | 2072-4292 |