Depth-Guided Monocular Object Pose Estimation for Warehouse Automation

Accurate object pose estimation is essential for warehouse automation, enabling tasks such as robotic picking, sorting, and inventory management. Current state-of-the-art approaches rely on both color (RGB) and depth (D) images, as depth information provides critical geometric cues that enhance obje...

Full description

Saved in:
Bibliographic Details
Main Authors: Phan Xuan Tan, Dinh-Cuong Hoang, Anh-Nhat Nguyen, Eiji Kamioka, Ta Huu Anh Duong, Tuan-Minh Huynh, Duc-Manh Nguyen, Duc-Huy Ngo, Minh-Duc Cao, Thu-Uyen Nguyen, Van-Thiep Nguyen, Duc-Thanh Tran, Van-Hiep Duong, Anh-Truong Mai, Duc-Long Pham, Khanh-Toan Phan, Minh-Quang Do
Format: Article
Language:English
Published: IEEE 2025-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/11050422/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Accurate object pose estimation is essential for warehouse automation, enabling tasks such as robotic picking, sorting, and inventory management. Current state-of-the-art approaches rely on both color (RGB) and depth (D) images, as depth information provides critical geometric cues that enhance object localization and improve robustness against occlusions. However, RGBD-based methods require specialized depth sensors, which can be costly and may not function reliably in warehouse environments with reflective surfaces, varying lighting conditions, or sensor occlusions. To address these limitations, researchers have explored RGB-only approaches, but the absence of depth cues makes it challenging to handle occlusions, estimate object geometry, and differentiate between textureless or highly similar objects, which are common in warehouses. In this paper, we propose a novel end-to-end depth-guided object pose estimation method tailored for warehouse automation. Our approach leverages both depth and color images during training but relies solely on RGB images during inference. Depth images are used to supervise the training of a depth estimation network, which generates initial depth-aware features. These features are then refined using our proposed depth-guided feature enhancement module to improve spatial understanding and robustness. The enhanced features are subsequently utilized for keypoint-based 6D object pose estimation. By integrating depth-guided feature learning, our method significantly enhances pose estimation accuracy, especially in cluttered warehouse environments with severe occlusions and textureless objects. Extensive experiments on warehouse-specific datasets, as well as standard benchmark datasets, demonstrate that our approach outperforms existing RGB-based methods while maintaining real-time inference speeds, making it a highly practical solution for real-world warehouse automation applications.
ISSN:2169-3536