Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention

In the realm of autonomous driving, the design and implementation of collaborative multi-task perception algorithms pose significant challenges. These challenges are primarily rooted in the need for real-time processing speeds, effective feature sharing among diverse tasks, and seamless information...

Full description

Saved in:
Bibliographic Details
Main Authors: Zhengfeng LI, Mingen ZHONG, Yihong ZHANG, Kang FAN, Zhiying DENG, Jiawei TAN
Format: Article
Language:Chinese
Published: Science Press 2025-06-01
Series:工程科学学报
Subjects:
Online Access:http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.10.09.001
Tags: Add Tag
No Tags, Be the first to tag this record!
_version_ 1839635498103996416
author Zhengfeng LI
Mingen ZHONG
Yihong ZHANG
Kang FAN
Zhiying DENG
Jiawei TAN
author_facet Zhengfeng LI
Mingen ZHONG
Yihong ZHANG
Kang FAN
Zhiying DENG
Jiawei TAN
author_sort Zhengfeng LI
collection DOAJ
description In the realm of autonomous driving, the design and implementation of collaborative multi-task perception algorithms pose significant challenges. These challenges are primarily rooted in the need for real-time processing speeds, effective feature sharing among diverse tasks, and seamless information fusion. Addressing these concerns is critical for enhancing the overall safety and efficiency of autonomous systems navigating complex traffic environments. Therefore, we propose MTEPN as an innovative deep convolutional neural network algorithm specifically designed to perform multiple visual tasks concurrently. This framework aims to achieve three essential objectives: vehicle target detection, extraction of drivable road areas, and segmentation of lane lines. By integrating these tasks into a unified model, MTEPN enhances the perceptual capabilities of autonomous driving systems and improves their ability to operate effectively in real-world settings. MTEPN is built upon the CSPDarkNet network, which is employed to extract fundamental features from traffic scene images. By leveraging a horizontal connection mechanism, this network enhances the feature extraction capabilities of the model, establishing a robust basis for subsequent multi-task processing. This initial step is crucial, as high-quality feature extraction determines the overall performance of the entire system. Subsequently, a multi-channel deformable feature aggregation module, termed C2f-K, is proposed. This module is designed to capture fine-grained global image features by facilitating cross-layer information fusion. By integrating features across different scales, C2f-K effectively reduces background noise and interference, thereby improving the understanding of complex scenes of the model. To further enhance the efficiency and accuracy of the model, an orthogonal attention mechanism called HWAttention is proposed. This mechanism minimizes computational load while amplifying significant spatial features within the input images. By selectively focusing on critical areas of interest, HWAttention significantly boosts the performance of the model across various environments, ensuring that it remains efficient even under real-time constraints. A notable advancement introduced in MTEPN is the cross-task feature aggregation structure. This module promotes information complementarity between tasks by implicitly modeling the global context relationships among different visual tasks. The integration of complementary pattern information deepens feature sharing, thereby enhancing the recognition accuracy of each task. This approach fosters a synergistic relationship among the tasks, enabling the model to operate more effectively than traditional methods, which treat tasks in isolation. Additionally, the decoupled task head module allows for independent processing of the three perceptual objectives. This design choice not only increases the flexibility of the model but also sharpens the focus on each task, allowing for tailored optimization strategies that enhance overall performance. Through experimental evaluations conducted on the BDD100k public dataset, MTEPN achieved impressive results, with an average mean average precision (mAP) of 79.4% for vehicle target detection and an average intersection-over-union (mIoU) of 92.4% for pixels extracted from the drivable area. Both these metrics surpass those of existing mainstream multi-task perception algorithms with comparable parameter scales. Furthermore, the lane line segmentation accuracy, measured by IoU, reached a sub-optimal value of 27.2%. Importantly, MTEPN maintains a modest parameter count of only 7.9 million and processes single-frame images in just 24.3 ms. This efficiency demonstrates its suitability for real-time applications in autonomous driving, where both speed and accuracy are paramount. The relevant code for this innovative algorithm will be made publicly available at https://github.com/XMUT-Vsion-Lab.
format Article
id doaj-art-9a5e3fbc976b47f29a68cdcd73062a49
institution Matheson Library
issn 2095-9389
language zho
publishDate 2025-06-01
publisher Science Press
record_format Article
series 工程科学学报
spelling doaj-art-9a5e3fbc976b47f29a68cdcd73062a492025-07-09T07:48:22ZzhoScience Press工程科学学报2095-93892025-06-014761303131310.13374/j.issn2095-9389.2024.10.09.001241009-0001Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attentionZhengfeng LI0Mingen ZHONG1Yihong ZHANG2Kang FAN3Zhiying DENG4Jiawei TAN5School of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, ChinaSchool of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, ChinaSchool of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, ChinaSchool of Aerospace Engineering, Xiamen University, Xiamen 361005, ChinaSchool of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, ChinaSchool of Aerospace Engineering, Xiamen University, Xiamen 361005, ChinaIn the realm of autonomous driving, the design and implementation of collaborative multi-task perception algorithms pose significant challenges. These challenges are primarily rooted in the need for real-time processing speeds, effective feature sharing among diverse tasks, and seamless information fusion. Addressing these concerns is critical for enhancing the overall safety and efficiency of autonomous systems navigating complex traffic environments. Therefore, we propose MTEPN as an innovative deep convolutional neural network algorithm specifically designed to perform multiple visual tasks concurrently. This framework aims to achieve three essential objectives: vehicle target detection, extraction of drivable road areas, and segmentation of lane lines. By integrating these tasks into a unified model, MTEPN enhances the perceptual capabilities of autonomous driving systems and improves their ability to operate effectively in real-world settings. MTEPN is built upon the CSPDarkNet network, which is employed to extract fundamental features from traffic scene images. By leveraging a horizontal connection mechanism, this network enhances the feature extraction capabilities of the model, establishing a robust basis for subsequent multi-task processing. This initial step is crucial, as high-quality feature extraction determines the overall performance of the entire system. Subsequently, a multi-channel deformable feature aggregation module, termed C2f-K, is proposed. This module is designed to capture fine-grained global image features by facilitating cross-layer information fusion. By integrating features across different scales, C2f-K effectively reduces background noise and interference, thereby improving the understanding of complex scenes of the model. To further enhance the efficiency and accuracy of the model, an orthogonal attention mechanism called HWAttention is proposed. This mechanism minimizes computational load while amplifying significant spatial features within the input images. By selectively focusing on critical areas of interest, HWAttention significantly boosts the performance of the model across various environments, ensuring that it remains efficient even under real-time constraints. A notable advancement introduced in MTEPN is the cross-task feature aggregation structure. This module promotes information complementarity between tasks by implicitly modeling the global context relationships among different visual tasks. The integration of complementary pattern information deepens feature sharing, thereby enhancing the recognition accuracy of each task. This approach fosters a synergistic relationship among the tasks, enabling the model to operate more effectively than traditional methods, which treat tasks in isolation. Additionally, the decoupled task head module allows for independent processing of the three perceptual objectives. This design choice not only increases the flexibility of the model but also sharpens the focus on each task, allowing for tailored optimization strategies that enhance overall performance. Through experimental evaluations conducted on the BDD100k public dataset, MTEPN achieved impressive results, with an average mean average precision (mAP) of 79.4% for vehicle target detection and an average intersection-over-union (mIoU) of 92.4% for pixels extracted from the drivable area. Both these metrics surpass those of existing mainstream multi-task perception algorithms with comparable parameter scales. Furthermore, the lane line segmentation accuracy, measured by IoU, reached a sub-optimal value of 27.2%. Importantly, MTEPN maintains a modest parameter count of only 7.9 million and processes single-frame images in just 24.3 ms. This efficiency demonstrates its suitability for real-time applications in autonomous driving, where both speed and accuracy are paramount. The relevant code for this innovative algorithm will be made publicly available at https://github.com/XMUT-Vsion-Lab.http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.10.09.001perception of traffic environmentmulti task learningobject detectionimage segmentationorthogonal attention
spellingShingle Zhengfeng LI
Mingen ZHONG
Yihong ZHANG
Kang FAN
Zhiying DENG
Jiawei TAN
Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
工程科学学报
perception of traffic environment
multi task learning
object detection
image segmentation
orthogonal attention
title Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_full Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_fullStr Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_full_unstemmed Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_short Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_sort traffic environment perception algorithm based on multi task feature fusion and orthogonal attention
topic perception of traffic environment
multi task learning
object detection
image segmentation
orthogonal attention
url http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.10.09.001
work_keys_str_mv AT zhengfengli trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention
AT mingenzhong trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention
AT yihongzhang trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention
AT kangfan trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention
AT zhiyingdeng trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention
AT jiaweitan trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention