Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention

In the realm of autonomous driving, the design and implementation of collaborative multi-task perception algorithms pose significant challenges. These challenges are primarily rooted in the need for real-time processing speeds, effective feature sharing among diverse tasks, and seamless information...

Full description

Saved in:

Bibliographic Details
Main Authors:	Zhengfeng LI, Mingen ZHONG, Yihong ZHANG, Kang FAN, Zhiying DENG, Jiawei TAN
Format:	Article
Language:	Chinese
Published:	Science Press 2025-06-01
Series:	工程科学学报
Subjects:	perception of traffic environment multi task learning object detection image segmentation orthogonal attention
Online Access:	http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.10.09.001
Tags:	Add Tag No Tags, Be the first to tag this record!

_version_	1839635498103996416
author	Zhengfeng LI Mingen ZHONG Yihong ZHANG Kang FAN Zhiying DENG Jiawei TAN
author_facet	Zhengfeng LI Mingen ZHONG Yihong ZHANG Kang FAN Zhiying DENG Jiawei TAN
author_sort	Zhengfeng LI
collection	DOAJ
description	In the realm of autonomous driving, the design and implementation of collaborative multi-task perception algorithms pose significant challenges. These challenges are primarily rooted in the need for real-time processing speeds, effective feature sharing among diverse tasks, and seamless information fusion. Addressing these concerns is critical for enhancing the overall safety and efficiency of autonomous systems navigating complex traffic environments. Therefore, we propose MTEPN as an innovative deep convolutional neural network algorithm specifically designed to perform multiple visual tasks concurrently. This framework aims to achieve three essential objectives: vehicle target detection, extraction of drivable road areas, and segmentation of lane lines. By integrating these tasks into a unified model, MTEPN enhances the perceptual capabilities of autonomous driving systems and improves their ability to operate effectively in real-world settings. MTEPN is built upon the CSPDarkNet network, which is employed to extract fundamental features from traffic scene images. By leveraging a horizontal connection mechanism, this network enhances the feature extraction capabilities of the model, establishing a robust basis for subsequent multi-task processing. This initial step is crucial, as high-quality feature extraction determines the overall performance of the entire system. Subsequently, a multi-channel deformable feature aggregation module, termed C2f-K, is proposed. This module is designed to capture fine-grained global image features by facilitating cross-layer information fusion. By integrating features across different scales, C2f-K effectively reduces background noise and interference, thereby improving the understanding of complex scenes of the model. To further enhance the efficiency and accuracy of the model, an orthogonal attention mechanism called HWAttention is proposed. This mechanism minimizes computational load while amplifying significant spatial features within the input images. By selectively focusing on critical areas of interest, HWAttention significantly boosts the performance of the model across various environments, ensuring that it remains efficient even under real-time constraints. A notable advancement introduced in MTEPN is the cross-task feature aggregation structure. This module promotes information complementarity between tasks by implicitly modeling the global context relationships among different visual tasks. The integration of complementary pattern information deepens feature sharing, thereby enhancing the recognition accuracy of each task. This approach fosters a synergistic relationship among the tasks, enabling the model to operate more effectively than traditional methods, which treat tasks in isolation. Additionally, the decoupled task head module allows for independent processing of the three perceptual objectives. This design choice not only increases the flexibility of the model but also sharpens the focus on each task, allowing for tailored optimization strategies that enhance overall performance. Through experimental evaluations conducted on the BDD100k public dataset, MTEPN achieved impressive results, with an average mean average precision (mAP) of 79.4% for vehicle target detection and an average intersection-over-union (mIoU) of 92.4% for pixels extracted from the drivable area. Both these metrics surpass those of existing mainstream multi-task perception algorithms with comparable parameter scales. Furthermore, the lane line segmentation accuracy, measured by IoU, reached a sub-optimal value of 27.2%. Importantly, MTEPN maintains a modest parameter count of only 7.9 million and processes single-frame images in just 24.3 ms. This efficiency demonstrates its suitability for real-time applications in autonomous driving, where both speed and accuracy are paramount. The relevant code for this innovative algorithm will be made publicly available at https://github.com/XMUT-Vsion-Lab.
format	Article
id	doaj-art-9a5e3fbc976b47f29a68cdcd73062a49
institution	Matheson Library
issn	2095-9389
language	zho
publishDate	2025-06-01
publisher	Science Press
record_format	Article
series	工程科学学报
spelling	doaj-art-9a5e3fbc976b47f29a68cdcd73062a492025-07-09T07:48:22ZzhoScience Press工程科学学报2095-93892025-06-014761303131310.13374/j.issn2095-9389.2024.10.09.001241009-0001Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attentionZhengfeng LI0Mingen ZHONG1Yihong ZHANG2Kang FAN3Zhiying DENG4Jiawei TAN5School of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, ChinaSchool of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, ChinaSchool of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, ChinaSchool of Aerospace Engineering, Xiamen University, Xiamen 361005, ChinaSchool of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen 361024, ChinaSchool of Aerospace Engineering, Xiamen University, Xiamen 361005, ChinaIn the realm of autonomous driving, the design and implementation of collaborative multi-task perception algorithms pose significant challenges. These challenges are primarily rooted in the need for real-time processing speeds, effective feature sharing among diverse tasks, and seamless information fusion. Addressing these concerns is critical for enhancing the overall safety and efficiency of autonomous systems navigating complex traffic environments. Therefore, we propose MTEPN as an innovative deep convolutional neural network algorithm specifically designed to perform multiple visual tasks concurrently. This framework aims to achieve three essential objectives: vehicle target detection, extraction of drivable road areas, and segmentation of lane lines. By integrating these tasks into a unified model, MTEPN enhances the perceptual capabilities of autonomous driving systems and improves their ability to operate effectively in real-world settings. MTEPN is built upon the CSPDarkNet network, which is employed to extract fundamental features from traffic scene images. By leveraging a horizontal connection mechanism, this network enhances the feature extraction capabilities of the model, establishing a robust basis for subsequent multi-task processing. This initial step is crucial, as high-quality feature extraction determines the overall performance of the entire system. Subsequently, a multi-channel deformable feature aggregation module, termed C2f-K, is proposed. This module is designed to capture fine-grained global image features by facilitating cross-layer information fusion. By integrating features across different scales, C2f-K effectively reduces background noise and interference, thereby improving the understanding of complex scenes of the model. To further enhance the efficiency and accuracy of the model, an orthogonal attention mechanism called HWAttention is proposed. This mechanism minimizes computational load while amplifying significant spatial features within the input images. By selectively focusing on critical areas of interest, HWAttention significantly boosts the performance of the model across various environments, ensuring that it remains efficient even under real-time constraints. A notable advancement introduced in MTEPN is the cross-task feature aggregation structure. This module promotes information complementarity between tasks by implicitly modeling the global context relationships among different visual tasks. The integration of complementary pattern information deepens feature sharing, thereby enhancing the recognition accuracy of each task. This approach fosters a synergistic relationship among the tasks, enabling the model to operate more effectively than traditional methods, which treat tasks in isolation. Additionally, the decoupled task head module allows for independent processing of the three perceptual objectives. This design choice not only increases the flexibility of the model but also sharpens the focus on each task, allowing for tailored optimization strategies that enhance overall performance. Through experimental evaluations conducted on the BDD100k public dataset, MTEPN achieved impressive results, with an average mean average precision (mAP) of 79.4% for vehicle target detection and an average intersection-over-union (mIoU) of 92.4% for pixels extracted from the drivable area. Both these metrics surpass those of existing mainstream multi-task perception algorithms with comparable parameter scales. Furthermore, the lane line segmentation accuracy, measured by IoU, reached a sub-optimal value of 27.2%. Importantly, MTEPN maintains a modest parameter count of only 7.9 million and processes single-frame images in just 24.3 ms. This efficiency demonstrates its suitability for real-time applications in autonomous driving, where both speed and accuracy are paramount. The relevant code for this innovative algorithm will be made publicly available at https://github.com/XMUT-Vsion-Lab.http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.10.09.001perception of traffic environmentmulti task learningobject detectionimage segmentationorthogonal attention
spellingShingle	Zhengfeng LI Mingen ZHONG Yihong ZHANG Kang FAN Zhiying DENG Jiawei TAN Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention 工程科学学报 perception of traffic environment multi task learning object detection image segmentation orthogonal attention
title	Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_full	Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_fullStr	Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_full_unstemmed	Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_short	Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention
title_sort	traffic environment perception algorithm based on multi task feature fusion and orthogonal attention
topic	perception of traffic environment multi task learning object detection image segmentation orthogonal attention
url	http://cje.ustb.edu.cn/article/doi/10.13374/j.issn2095-9389.2024.10.09.001
work_keys_str_mv	AT zhengfengli trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention AT mingenzhong trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention AT yihongzhang trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention AT kangfan trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention AT zhiyingdeng trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention AT jiaweitan trafficenvironmentperceptionalgorithmbasedonmultitaskfeaturefusionandorthogonalattention

Traffic environment perception algorithm based on multi-task feature fusion and orthogonal attention

Similar Items