NGSTGAN: N-Gram Swin Transformer and Multi-Attention U-Net Discriminator for Efficient Multi-Spectral Remote Sensing Image Super-Resolution

The reconstruction of high-resolution (HR) remote sensing images (RSIs) from low-resolution (LR) counterparts is a critical task in remote sensing image super-resolution (RSISR). Recent advancements in convolutional neural networks (CNNs) and Transformers have significantly improved RSISR performanc...

Full description

Saved in:

Bibliographic Details
Main Authors:	Chao Zhan, Chunyang Wang, Bibo Lu, Wei Yang, Xian Zhang, Gaige Wang
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Remote Sensing
Subjects:	super-resolution deep learning convolutional neural network generative adversarial networks transformer N-Gram
Online Access:	https://www.mdpi.com/2072-4292/17/12/2079
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The reconstruction of high-resolution (HR) remote sensing images (RSIs) from low-resolution (LR) counterparts is a critical task in remote sensing image super-resolution (RSISR). Recent advancements in convolutional neural networks (CNNs) and Transformers have significantly improved RSISR performance due to their capabilities in local feature extraction and global modeling. However, several limitations remain, including the underutilization of multi-scale features in RSIs, the limited receptive field of Swin Transformer’s window self-attention (WSA), and the computational complexity of existing methods. To address these issues, this paper introduces the NGSTGAN model, which employs an N-Gram Swin Transformer as the generator and a multi-attention U-Net as the discriminator. The discriminator enhances attention to multi-scale key features through the addition of channel, spatial, and pixel attention (CSPA) modules, while the generator utilizes an improved shallow feature extraction (ISFE) module to extract multi-scale and multi-directional features, enhancing the capture of complex textures and details. The N-Gram concept is introduced to expand the receptive field of Swin Transformer, and sliding window self-attention (S-WSA) is employed to facilitate interaction between neighboring windows. Additionally, channel-reducing group convolution (CRGC) is used to reduce the number of parameters and computational complexity. A cross-sensor multispectral dataset combining Landsat-8 (L8) and Sentinel-2 (S2) is constructed for the resolution enhancement of L8’s blue (B), green (G), red (R), and near-infrared (NIR) bands from 30 m to 10 m. Experiments show that NGSTGAN outperforms the state-of-the-art (SOTA) method, achieving improvements of 0.5180 dB in the peak signal-to-noise ratio (PSNR) and 0.0153 in the structural similarity index measure (SSIM) over the second best method, offering a more effective solution to the task.
ISSN:	2072-4292

NGSTGAN: N-Gram Swin Transformer and Multi-Attention U-Net Discriminator for Efficient Multi-Spectral Remote Sensing Image Super-Resolution

Similar Items