An improved deep learning approach for speech enhancement
Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cro...
Saved in:
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universidade do Porto
2023-11-01
|
Series: | U.Porto Journal of Engineering |
Subjects: | |
Online Access: | https://journalengineering.fe.up.pt/index.php/upjeng/article/view/1531 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Single-channel speech enhancement refers to the task of improving the quality and intelligibility of a speech signal in a noisy environment. Time-domain and time-frequency-domain methods are two main categories of approaches for speech enhancement. In this paper, we propose a approach based on a cross-domain framework. This framework utilizes our knowledge of the spectrogram and overcomes some of the limitations faced by time-frequency domain methods. First, we apply the intrinsic mode functions of the empirical mode decomposition and an improved version of principal component analysis. Then, we design a cross-domain learning framework to determine the correlations along the frequency and time axes. At low SNR = -5 dB, the effectiveness of our proposed approach is demonstrated by its performance based on objective and subjective measures. With average scores of -0.49, 2.47, 2.44, and 0.68 for SegSNR, PESQ, Cov, and STOI, respectively. The results highlight the success of our approach in addressing low SNR conditions.
|
---|---|
ISSN: | 2183-6493 |