Depth-Guided Monocular Object Pose Estimation for Warehouse Automation
Accurate object pose estimation is essential for warehouse automation, enabling tasks such as robotic picking, sorting, and inventory management. Current state-of-the-art approaches rely on both color (RGB) and depth (D) images, as depth information provides critical geometric cues that enhance obje...
Saved in:
Main Authors: | , , , , , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2025-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/11050422/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Accurate object pose estimation is essential for warehouse automation, enabling tasks such as robotic picking, sorting, and inventory management. Current state-of-the-art approaches rely on both color (RGB) and depth (D) images, as depth information provides critical geometric cues that enhance object localization and improve robustness against occlusions. However, RGBD-based methods require specialized depth sensors, which can be costly and may not function reliably in warehouse environments with reflective surfaces, varying lighting conditions, or sensor occlusions. To address these limitations, researchers have explored RGB-only approaches, but the absence of depth cues makes it challenging to handle occlusions, estimate object geometry, and differentiate between textureless or highly similar objects, which are common in warehouses. In this paper, we propose a novel end-to-end depth-guided object pose estimation method tailored for warehouse automation. Our approach leverages both depth and color images during training but relies solely on RGB images during inference. Depth images are used to supervise the training of a depth estimation network, which generates initial depth-aware features. These features are then refined using our proposed depth-guided feature enhancement module to improve spatial understanding and robustness. The enhanced features are subsequently utilized for keypoint-based 6D object pose estimation. By integrating depth-guided feature learning, our method significantly enhances pose estimation accuracy, especially in cluttered warehouse environments with severe occlusions and textureless objects. Extensive experiments on warehouse-specific datasets, as well as standard benchmark datasets, demonstrate that our approach outperforms existing RGB-based methods while maintaining real-time inference speeds, making it a highly practical solution for real-world warehouse automation applications. |
---|---|
ISSN: | 2169-3536 |