Enhancing semantic segmentation for autonomous vehicle scene understanding in indian context using modified CANet model
Recent advancements in artificial intelligence (AI) have increased interest in intelligent transportation systems, particularly autonomous vehicles. Safe navigation in traffic-heavy environments requires accurate road scene segmentation, yet traditional computer vision methods struggle with complex...
Saved in:
Main Authors: | , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2025-06-01
|
Series: | MethodsX |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S221501612400582X |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Recent advancements in artificial intelligence (AI) have increased interest in intelligent transportation systems, particularly autonomous vehicles. Safe navigation in traffic-heavy environments requires accurate road scene segmentation, yet traditional computer vision methods struggle with complex scenarios. This study emphasizes the role of deep learning in improving semantic segmentation using datasets like the Indian Driving Dataset (IDD), which presents unique challenges in chaotic road conditions. We propose a modified CANet that incorporates U-Net and LinkNet elements, focusing on accuracy, efficiency, and resilience. The CANet features an encoder-decoder architecture and a Multiscale Context Module (MCM) with three parallel branches to capture contextual information at multiple scales. Our experiments show that the proposed model achieves a mean Intersection over Union (mIoU) value of 0.7053, surpassing state-of-the-art models in efficiency and performance.Here we demonstrate: • Traditional computer vision methods struggle with complex driving scenarios, but deep learning based semantic segmentation methods show promising results. • Modified CANet, incorporating U-Net and LinkNet elements is proposed for semantic segmentation of unstructured driving scenarios. • The CANet structure consists of an encoder-decoder architecture and a Multiscale Context Module (MCM) with three parallel branches to capture contextual information at multiple scales. |
---|---|
ISSN: | 2215-0161 |