LST-BEV: Generating a Long-Term Spatial–Temporal Bird’s-Eye-View Feature for Multi-View 3D Object Detection
This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2025-06-01
|
Series: | Sensors |
Subjects: | |
Online Access: | https://www.mdpi.com/1424-8220/25/13/4040 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | This paper presents a novel multi-view 3D object detection framework, Long-Term Spatial–Temporal Bird’s-Eye View (LST-BEV), designed to improve performance in autonomous driving. Traditional 3D detection relies on sensors like LiDAR, but visual perception using multi-camera systems is emerging as a more cost-effective solution. Existing methods struggle with capturing long-range dependencies and cross-task information due to limitations in attention mechanisms. To address this, we propose a Long-Range Cross-Task Detection Head (LRCH) to capture these dependencies and integrate cross-task information for accurate predictions. Additionally, we introduce the Long-Term Temporal Perception Module (LTPM), which efficiently extracts temporal features by combining Mamba and linear attention, overcoming challenges in temporal frame extraction. Experimental results in the nuScenes dataset demonstrate that our proposed LST-BEV outperforms its baseline (SA-BEVPool) by 2.1% mAP and 2.7% NDS, indicating a significant performance improvement. |
---|---|
ISSN: | 1424-8220 |