Signal ProcessingNo Access

Phase-Sensitive Decision-Directed SNR Estimator for Single-Channel Speech Enhancement

Shifeng Ou

School of Opto-Electronic Information, Yantai University, Yantai 264005, P. R. China

Search for more papers by this author

Peng Song

School of Computer and Control Engineering, Yantai University, Yantai 264005, P. R. China

Search for more papers by this author

, and

Ying Gao

School of Opto-Electronic Information, Yantai University, Yantai 264005, P. R. China

E-mail Address: claragaoying@126.com

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S0218001417580034Cited by:1 (Source: Crossref)

Abstract

The a priori signal-to-noise ratio (SNR) plays an essential role in many speech enhancement systems. Most of the existing approaches to estimate the a priori SNR only exploit the amplitude spectra while making the phase neglected. Considering the fact that incorporating phase information into a speech processing system can significantly improve the speech quality, this paper proposes a phase-sensitive decision-directed (DD) approach for the a priori SNR estimate. By representing the short-time discrete Fourier transform (STFT) signal spectra geometrically in a complex plane, the proposed approach estimates the a priori SNR using both the magnitude and phase information while making no assumptions about the phase difference between clean speech and noise spectra. Objective evaluations in terms of the spectrograms, segmental SNR, log-spectral distance (LSD) and short-time objective intelligibility (STOI) measures are presented to demonstrate the superiority of the proposed approach compared to several competitive methods at different noise conditions and input SNR levels.

Keywords:

References

1. S. F. Boll, Suppression of acoustic noise in speech using spectral subtraction, IEEE Trans. Acoust. Speech Signal Process. 27 (2) (1979) 113–120. Crossref, Google Scholar
2. O. Cappe, Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor, IEEE Trans. Speech Audio Process. 2 (2) (1994) 346–349. Crossref, Google Scholar
3. A. Chinaev and R. Haeb-Umbach, A priori SNR estimation using a generalized decision-directed approach, in Proc. INTERSPEECH, 8–12 September 2016, San Francisco, USA, pp. 3758–3762. Google Scholar
4. H. Ding, I. Soon, S. Koh and C. Yeo, A spectral filtering method based on hybrid wiener filters for speech enhancement, Speech Commun. 51 (3) (2009) 259–267. Crossref, Web of Science, Google Scholar
5. M. Djendi and P. Scalart, Reducing over- and under-estimation of the a priori SNR in speech enhancement techniques, Digit. Signal Process. 32 (9) (2014) 124–136. Crossref, Web of Science, Google Scholar
6. M. K. Hasan, S. Salahuddin and M. R. Khan, A modified a priori SNR for speech enhancement using spectral subtraction rules, IEEE Signal Process. Lett. 11 (4) (2004) 450–453. Crossref, Web of Science, Google Scholar
7. T. Huang et al., On-Line handy handwriting Chinese characters input for non-Chinese speakers based on wavelet neural network, Int. J. Pattern Recognit. Artif. Intell. 30 (6) (2016) 1659017. Link, Web of Science, Google Scholar
8. N. Kitaoka and S. Nakagawa, Evaluation of spectral subtraction with smoothing of time direction on the Aurora 2 task, in Proc. INTERSPEECH, 16–20 September 2002, Denver, USA, pp. 477–480. Google Scholar
9. J. Kulmer and P. Mowlace, Phase estimation in single channel speech enhancement using phase decomposition, IEEE Signal Process. Lett. 22 (5) (2015) 598–602. Crossref, Web of Science, Google Scholar
10. S. Lee, C. Lim and J. H. Chang, A new a priori SNR estimator based on multiple linear regression technique for speech enhancement, Digit. Signal Process. 30 (7) (2014) 154–164. Crossref, Web of Science, Google Scholar
11. P. C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, FL, 2007). Crossref, Google Scholar
12. Y. Lu and P. C. Loizou, A geometric approach to spectral subtraction, Speech Commun. 50 (6) (2008) 453–466. Crossref, Web of Science, Google Scholar
13. R. Martin, Noise power spectral density estimation based on optimal smoothing and minimum statistics, IEEE Trans. Speech Audio Process. 9 (5) (2001) 504–512. Crossref, Google Scholar
14. S. P. Patil and J. N. Gowdy, Use of baseband phase structure to improve the performance of current speech enhancement algorithms, Speech Commun. 67 (3) (2015) 78–91. Crossref, Web of Science, Google Scholar
15. C. Plapous and C. Marro, Improved signal-to-noise ratio estimation for speech enhancement, IEEE Trans. Audio Speech Lang. Process. 14 (6) (2006) 2098–2108. Crossref, Google Scholar
16. H. S. Shin, T. Fingscheidt and H. G. Kang, A priori SNR estimation using air- and bone-conduction microphones, IEEE Trans. Audio Speech Lang. Process. 23 (11) (2015) 2015–2025. Crossref, Google Scholar
17. C. H. Taal, R. C. Hendriks, R. Heusdens and J. Jensen, An algorithm for intelligibility prediction of time–frequency weighted noisy speech, IEEE Trans. Audio Speech Lang. Process. 19 (7) (2011) 2125–2136. Crossref, Google Scholar
18. R. Thoonsaengngam and N. Tangsangiumvisai, The a priori SDR estimation techniques with reduced speech distortion for acoustic echo and noise suppression, IEICE Trans. Commun. 92 (10) (2009) 3022–3033. Crossref, Web of Science, Google Scholar
19. W. Wang et al., Speaker verification via modelling kurtosis using sparse coding, Int. J. Pattern Recognit. Artif. Intell. 30 (3) (2016) 1659008. Link, Web of Science, Google Scholar
20. P. J. Wolfe and S. J. Godsill, Efficient alternatives to the Ephraim and Malah suppression rule for audio signal enhancement, EURASIP J. Appl. Signal Process. 2003 (10) (2003) 1043–1051. Crossref, Google Scholar
21. R. Yao, Z. Q. Zeng and P. Zhu, A priori SNR estimation and noise estimation for speech enhancement, EURASIP J. Appl. Signal Process. 2016 (1) (2016) 101. Crossref, Google Scholar
22. P. C. Yong, S. Nordholm and H. H. Dam, Optimization and evaluation of sigmoid function with a priori SNR estimate for real-time speech enhancement, Speech Commun. 55 (2) (2013) 358–376. Crossref, Web of Science, Google Scholar