Multi-granularity representation learning with vision Mamba for infrared small target detection
Heterogeneous environments and low Signal-to-Clutter Ratio (SCR) pose a challenge for Infrared Small Target Detection (IRSTD). Convolutional Neural Network (CNN) is constrained by the global view. Transformer with quadratic computational complexity struggles for local feature refinement. Inspired by...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-08-01
|
Series: | International Journal of Applied Earth Observations and Geoinformation |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1569843225002924 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Heterogeneous environments and low Signal-to-Clutter Ratio (SCR) pose a challenge for Infrared Small Target Detection (IRSTD). Convolutional Neural Network (CNN) is constrained by the global view. Transformer with quadratic computational complexity struggles for local feature refinement. Inspired by the quad-directional scanning State Space Model (SSM) with linear complexity for long-range modeling, this research reconceptualizes the spatial and structural information of small targets in IR images. Multi-granularity features and long-range dependency of small targets are considered simultaneously. Specifically, we tailor a nested structure with cross-fertilization of global and local information. Each layer of the top-level pyramid network embeds a tiny well-configured contextual pyramid block to extract fine-grained features of small targets. The following Mamba module restructures the feature maps to derive coarse-grained features of “visual sentences”. The fusion of contextual information and local feature achieves precise localization of small targets. Furthermore, we propose the Asymmetric Convolution (AConv) for substituting the Depthwise Convolution (DWConv) in the Visual State Space (VSS) module and the regular convolution in each lateral connection of the nested pyramid network to alleviate the parameters and computation. Both qualitative and quantitative experiments demonstrate that our proposed model outperforms 12 recent baseline methods on two public datasets. |
---|---|
ISSN: | 1569-8432 |