Controlled-SAM and Context Promoting Network for Fine-Grained Semantic Segmentation

Fine-grained semantic segmentation of remote sensing imagery is critical for applications such as land use analysis and agricultural monitoring. However, it remains challenging due to the subtle inter-class differences between visually similar objects, which often result in misclassifications. This...

Full description

Saved in:
Bibliographic Details
Main Authors: Jinglin Zhang, Yuxia Li, Lei He, Bowei Zhang, Zhenye Niu, Yonghui Zhang, Shiyu Luo
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11045311/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Fine-grained semantic segmentation of remote sensing imagery is critical for applications such as land use analysis and agricultural monitoring. However, it remains challenging due to the subtle inter-class differences between visually similar objects, which often result in misclassifications. This challenge becomes particularly evident in distinguishing classes such as rivers, ponds, and fishponds, which share similar spectral and spatial characteristics. To address these challenges, we propose CSCPNet, a novel framework optimized for fine-grained feature extraction and segmentation accuracy. CSCPNet features the controlled-segment anything model (SAM) encoder and the context promoting decoder. The controlled SAM encoder, by using shallow and deep feature fusion modules, integrates multiscale features from both a pretrained SAM encoder and a lightweight encoder, excelling in capturing detailed fine-grained features. The context promoting decoder with context attention is designed to iteratively refine feature maps through multistep decoding, effectively incorporating contextual information. Extensive experiments on FBP and ShengTeng datasets with fine-grained classes demonstrate that CSCPNet achieves state-of-the-art performance in fine-grained semantic segmentation. On the FBP dataset with 24 fine-grained classes, CSCPNet improves overall accuracy (OA), mean intersection over union (mIoU), and mF1 by 4.4%, 6.7%, and 9.3%, respectively. Similarly, on the ShengTeng dataset with 47 fine-grained classes, it achieves gains of 5.5% in OA, 7.3% in mIoU, and 7.9% in mF1. Meanwhile, CSCPNet maintains competitive accuracy in normal segmentation datasets such as Potsdam dataset and CZWZ dataset. These results demonstrate that CSCPNet excels at capturing fine-grained details and effectively distinguishing visually similar classes, making it a robust and efficient solution for fine-grained semantic segmentation of remote sensing images.
ISSN:1939-1404
2151-1535