Controlled-SAM and Context Promoting Network for Fine-Grained Semantic Segmentation
Fine-grained semantic segmentation of remote sensing imagery is critical for applications such as land use analysis and agricultural monitoring. However, it remains challenging due to the subtle inter-class differences between visually similar objects, which often result in misclassifications. This...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/11045311/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Fine-grained semantic segmentation of remote sensing imagery is critical for applications such as land use analysis and agricultural monitoring. However, it remains challenging due to the subtle inter-class differences between visually similar objects, which often result in misclassifications. This challenge becomes particularly evident in distinguishing classes such as rivers, ponds, and fishponds, which share similar spectral and spatial characteristics. To address these challenges, we propose CSCPNet, a novel framework optimized for fine-grained feature extraction and segmentation accuracy. CSCPNet features the controlled-segment anything model (SAM) encoder and the context promoting decoder. The controlled SAM encoder, by using shallow and deep feature fusion modules, integrates multiscale features from both a pretrained SAM encoder and a lightweight encoder, excelling in capturing detailed fine-grained features. The context promoting decoder with context attention is designed to iteratively refine feature maps through multistep decoding, effectively incorporating contextual information. Extensive experiments on FBP and ShengTeng datasets with fine-grained classes demonstrate that CSCPNet achieves state-of-the-art performance in fine-grained semantic segmentation. On the FBP dataset with 24 fine-grained classes, CSCPNet improves overall accuracy (OA), mean intersection over union (mIoU), and mF1 by 4.4%, 6.7%, and 9.3%, respectively. Similarly, on the ShengTeng dataset with 47 fine-grained classes, it achieves gains of 5.5% in OA, 7.3% in mIoU, and 7.9% in mF1. Meanwhile, CSCPNet maintains competitive accuracy in normal segmentation datasets such as Potsdam dataset and CZWZ dataset. These results demonstrate that CSCPNet excels at capturing fine-grained details and effectively distinguishing visually similar classes, making it a robust and efficient solution for fine-grained semantic segmentation of remote sensing images. |
---|---|
ISSN: | 1939-1404 2151-1535 |