Neural network models for whisper to normal speech conversion

Whispers are common and essential for secondary communication. Nonetheless, individuals with aphonia, including laryngectomees, rely on whispers as their primary means of communication. Due to the distinct features between whispered and regular speech, debates have emerged in the field of speech re...

Full description

Saved in:
Bibliographic Details
Main Authors: Cézar Yamamura, Paulo Scalassara, Marco Oliveira, Aníbal Ferreira
Format: Article
Language:English
Published: Universidade do Porto 2025-03-01
Series:U.Porto Journal of Engineering
Subjects:
Online Access:https://journalengineering.fe.up.pt/index.php/upjeng/article/view/2739
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Whispers are common and essential for secondary communication. Nonetheless, individuals with aphonia, including laryngectomees, rely on whispers as their primary means of communication. Due to the distinct features between whispered and regular speech, debates have emerged in the field of speech recognition, highlighting the challenge of effectively converting between them. This study investigates the characteristics of whispered speech and proposes a system for converting whispered vowels into normal ones. The system is developed using multilayer perceptron networks and two types of generative adversarial networks. Three metrics are analyzed to evaluate the performance of the system: mel-cepstral distortion, root mean square error of the fundamental frequency, and accuracy with f1-score of a vowel classifier. Overall, the perceptron networks demonstrated better results, with no significant differences observed between male and female voices or the presence/absence of speech silence, except for improved accuracy in estimating the fundamental frequency during the conversion process.
ISSN:2183-6493