SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification

Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most e...

Full description

Saved in:

Bibliographic Details
Main Authors:	Xuan Liu, Zhenyu Lu, Bingjian Lu, Zhuang Li, Zhongfeng Chen, Yongjie Ma
Format:	Article
Language:	English
Published:	MDPI AG 2025-06-01
Series:	Remote Sensing
Subjects:	cloud image recognition shapeformer attention mechanism ensemble learning multimodal feature fusion
Online Access:	https://www.mdpi.com/2072-4292/17/12/2034
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Satellite cloud images exhibit complex multidimensional characteristics, including spectral, textural, and spatiotemporal dynamics. The temporal evolution of cloud systems plays a crucial role in accurate classification, particularly under the coexistence of multiple weather systems. However, most existing models—such as those based on convolutional neural networks (CNNs), Transformer architectures, and their variants like Swin Transformer—primarily focus on spatial modeling of static images and do not explicitly incorporate temporal information, thereby limiting their ability to effectively integrate spatiotemporal features. To address this limitation, we propose SIG-ShapeFormer, a novel classification model specifically designed for satellite cloud images with temporal continuity. To the best of our knowledge, this work is the first to transform satellite cloud data into multivariate time series and introduce a unified framework for multi-scale and multimodal feature fusion. SIG-Shapeformer consists of three core components: (1) a Shapelet-based module that captures discriminative and interpretable local temporal patterns; (2) a multi-scale Inception module combining 1D convolutions and Transformer encoders to extract temporal features across different scales; and (3) a differentially enhanced Gramian Angular Summation Field (GASF) module that converts time series into 2D texture representations, significantly improving the recognition of cloud internal structures. Experimental results demonstrate that SIG-ShapeFormer achieves a classification accuracy of 99.36% on the LSCIDMR-S dataset, outperforming the original ShapeFormer by 2.2% and outperforming other CNN- or Transformer-based models. Moreover, the model exhibits strong generalization performance on the UCM remote sensing dataset and several benchmark tasks from the UEA time-series archive. SIG-Shapeformer is particularly suitable for remote sensing applications involving continuous temporal sequences, such as extreme weather warnings and dynamic cloud system monitoring. However, it relies on temporally coherent input data and may perform suboptimally when applied to datasets with limited or irregular temporal resolution.
ISSN:	2072-4292

SIG-ShapeFormer: A Multi-Scale Spatiotemporal Feature Fusion Network for Satellite Cloud Image Classification

Similar Items