SC-CoSF: Self-Correcting Collaborative and Co-Training for Image Fusion and Semantic Segmentation
Multimodal image fusion and semantic segmentation play pivotal roles in autonomous driving and robotic systems, yet their inherent interdependence remains underexplored. To address this gap and overcome performance bottlenecks, we propose SC-CoSF, a novel coupled framework that jointly optimizes the...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/25/12/3575 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Multimodal image fusion and semantic segmentation play pivotal roles in autonomous driving and robotic systems, yet their inherent interdependence remains underexplored. To address this gap and overcome performance bottlenecks, we propose SC-CoSF, a novel coupled framework that jointly optimizes these tasks through synergistic learning. Our approach replaces traditional duplex encoders with a weight-sharing CNN encoder, implicitly aligning multimodal features while reducing parameter overhead. The core innovation lies in our Self-correction and Collaboration Fusion Module (Sc-CFM), which integrates (1) a Self-correction Long-Range Relationship Branch (Sc-LRB) to strengthen global semantic modeling, (2) a Self-correction Fine-Grained Branch (Sc-FGB) for enhanced visual detail retention through local feature aggregation, and (3) a Dual-branch Collaborative Recalibration (DCR) mechanism for cross-task feature refinement. This design preserves critical edge textures and color contrasts for segmentation while leveraging segmentation-derived spatial priors to guide fusion. We further introduce the Interactive Context Recovery Mamba Decoder (ICRM) to restore lost long-range dependencies during the upsampling process; meanwhile, we propose the Region Adaptive Weighted Reconstruction Decoder (ReAW), which is mainly used to reduce feature redundancy in image fusion tasks. End-to-end joint training enables gradient propagation across all task branches via shared parameters, exploiting inter-task consistency for superior performance. Experiments demonstrate significant improvements over independently optimized baselines in both fusion quality and segmentation accuracy. |
---|---|
ISSN: | 1424-8220 |