An improved deep learning approach for speech enhancement

Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cro...

Full description

Saved in:
Bibliographic Details
Main Authors: Malek Miled, Mohamed Anouar Ben Messaoud
Format: Article
Language:English
Published: Universidade do Porto 2023-11-01
Series:U.Porto Journal of Engineering
Subjects:
Online Access:https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1531
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cross-domain framework. This framework utilizes our knowledge of the spectrogram and overcomes some of the limitations faced by time-frequency domain methods. First, we apply the intrinsic mode functions of the empirical mode decomposition and an improved version of principal component analysis. Then, we design a cross-domain learning framework to determine the correlations along the frequency and time axes. At low SNR = -5 dB, the effectiveness of our proposed approach is demonstrated by its performance based on objective and subjective measures. With average scores of -0.49, 2.47, 2.44, and 0.68 for SegSNR, PESQ, Cov, and STOI, respectively. The results highlight the success of our approach in addressing low SNR conditions.
ISSN:2183-6493