An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment

Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objec...

Full description

Saved in:

Bibliographic Details
Main Authors:	Alessio Catalfamo, Antonio Celesti, Maria Fazio, A. F. M. Saifuddin Saif, Yu-Sheng Lin, Edelberto Franco Silva, Massimo Villari
Format:	Article
Language:	English
Published:	MDPI AG 2025-07-01
Series:	Big Data and Cognitive Computing
Subjects:	virtual reality automated speech recognition convolutional neural networks ONNX
Online Access:	https://www.mdpi.com/2504-2289/9/7/188
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Nowadays, the Metaverse is facing many challenges. In this context, Virtual Reality (VR) applications allowing voice-based human–3D object interactions are limited due to the current hardware/software limitations. In fact, adopting Automated Speech Recognition (ASR) systems to interact with 3D objects in VR applications through users’ voice commands presents significant challenges due to the hardware and software limitations of headset devices. This paper aims to bridge this gap by proposing a methodology to address these issues. In particular, starting from a Mel-Frequency Cepstral Coefficient (MFCC) extraction algorithm able to capture the unique characteristics of the user’s voice, we pass it as input to a Convolutional Neural Network (CNN) model. After that, in order to integrate the CNN model with a VR application running on a standalone headset, such as Oculus Quest, we converted it into an Open Neural Network Exchange (ONNX) format, i.e., a Machine Learning (ML) interoperability open standard format. The proposed system demonstrates good performance and represents a foundation for the development of user-centric, effective computing systems, enhancing accessibility to VR environments through voice-based commands. Experiments demonstrate that a native CNN model developed through TensorFlow presents comparable performances with respect to the corresponding CNN model converted into the ONNX format, paving the way towards the development of VR applications running in headsets controlled through the user’s voice.
ISSN:	2504-2289

An Approach to Enable Human–3D Object Interaction Through Voice Commands in an Immersive Virtual Environment

Similar Items