Information TheoryNo Access

Deep neural network and noise classification-based speech enhancement

Wenhua Shi

Lab of Intelligent Information Processing, PLA University of Science and Technology, Nanjing 210017, China

Flight Instructor Training Base, Air Force Aviation University, Bengbu 233000, China

E-mail Address: whshi0919@163.com

Corresponding author.

Search for more papers by this author

Xiongwei Zhang

Lab of Intelligent Information Processing, PLA University of Science and Technology, Nanjing 210017, China

Search for more papers by this author

Xia Zou

Lab of Intelligent Information Processing, PLA University of Science and Technology, Nanjing 210017, China

Search for more papers by this author

, and

Wei Han

Lab of Intelligent Information Processing, PLA University of Science and Technology, Nanjing 210017, China

Search for more papers by this author

https://doi.org/10.1142/S0217984917400966Cited by:13 (Source: Crossref)

Abstract

In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.

Keywords:

References

1. P. C. Loizou, Speech Enhancement: Theory and Practice (CRC Press, Boca Raton, 2007). Google Scholar
2. T. T. Vu, B. Bigot and E. S. Chng, in Proc. Shanghai IEEE Int. Conf. Acoust. Speech, Signal Processing (IEEE, USA, 2016), pp. 499–503. Google Scholar
3. Y. X. Wang, A. Narayanan and D. L. Wang, IEEE Trans. Audio Speech 22 (2014) 1849. Google Scholar
4. Y. Tu et al., in Proc. Hangzhou IEEE Int. Conf. Signal Processing (IEEE, USA, 2014), pp. 532–536. Google Scholar
5. B. F. Wu and K. C. Wang, IEICE Trans. Fund. Electr. 89 (2006) 479. Web of Science, Google Scholar
6. B. Xia and C. Bao, Speech Commun. 60 (2014) 13. Web of Science, Google Scholar
7. Y. Xu et al., IEEE Trans. Audio Speech 23 (2014) 7. Google Scholar
8. X. Glorot, A. Bordes and Y. Bengio, in Proc. Int. Conf. AISTATS (JMLR, USA, 2011), pp. 315–323. Google Scholar