No Access

Deep Convolutional Neural Network-based Speech Signal Enhancement Using Extensive Speech Features

Anil Garg

ECE Department, MMEC, MMDU, Mullana (Ambala), Haryana, India

E-mail Address: anilgarg0778@gmail.com

Corresponding author.

Search for more papers by this author

and

O. P. Sahu

O. P. Sahu ECE Department, NIT Kurukshetra, Haryana, India

Search for more papers by this author

https://doi.org/10.1142/S0219876221420056Cited by:4 (Source: Crossref)

This article is part of the issue:

Special Issue: Scientific Machine Learning in Engineering
Guest Editors: Pattabhi Ramaiah Budarapu, Cosmin Anitescu and Timon Rabczuk

Abstract

Speech signal often gets corrupted by different noises like airport noise, station noise, and street noise. These noises tend to degrade the quality of the speech signal, particularly in voice communication, automatic speech recognition, and speaker identification. Therefore, it is necessary for automatic speech enhancement. In this research work, a novel speech signal enhancement model is introduced with the assistance of deep learning. The proposed model includes three major phases: (a) pre-processing, (b) feature extraction, and (c) speech enhancement. In the pre-processing phase, the framing will be carried out using the Hanning window, where the input speech signals will be decomposed into a series of overlapping frames. Then, from these individual frames, the multi-features like the improved Mel-frequency cepstral coefficients (IMFCCs), fractional delta AMS, and modified STFT (M-STFT) will be extracted. Subsequently, in the speech enhancement phase, the available noise is estimated initially, and it is removed. The noise removed signals from the frames are used to determine the optimal mask of all the frames of the noisy speech signal, and the mask is employed for training the Deep Convolutional Neural Network (DCNN). The reconstructed outcomes from DCNN are the enhanced speech signal. Finally, the proposed work (multi-features+ DCNN-based Speech Enhancement) is validated over existing models in terms of certain measures, which exhibits the supremacy of the proposed work.

Keywords:

Remember to check out the Most Cited Articles!
Check out these titles in finite element methods!

References

Adeel, A., Gogate, M. and Hussain, A. [2019] “ Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments,” Information Fusion. Web of Science, Google Scholar
Anita, J. S. and Abinaya, J. S. [2019] “ Impact of supervised classifier on speech emotion recognition,” Multim. Res. 2(1), 9–16. Google Scholar
Arul, V. H., Sivakumar, V. G., Marimuthu, R. and Chakraborty, B. [2019] “ An approach for speech enhancement using deep convolutional neural network,” Multim. Res. 2(1), 37–44. Google Scholar
Bojja, G. R., Ofori, M., Liu, J. and Ambati, L. S. [2020] “Early public outlook on the coronavirus disease (COVID-19): A social media study”. Google Scholar
Boll, S. F. [1979] “ Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Signal Processing 27(2), 113–120. Crossref, Web of Science, Google Scholar
Chai, L., Du, J., Liu, Q.-F. and Lee, C.-H. [2021] “ A cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing DNN-based speech enhancement,” IEEE/ACM Trans. Audio Speech Lang. Processing 29, 106–117, https://doi.org/10.1109/TASLP.2020.3036783. Crossref, Google Scholar
Chandanapalli, S. B., Chung, H., Plourde, E. and Champagne, B. [2017] “ Regularized non-negative matrix factorization with Gaussian mixtures and masking model for speech enhancement,” Speech Commun. 87, 18–30. Crossref, Web of Science, Google Scholar
Chandanapalli, S. B., Sreenivasa Reddy, E. and Rajya Lakshmi, D. [2019] “ Convolutional neural network for water quality prediction in WSN,” J. Netw. Commun. Syst. 2(3), 40–47. Google Scholar
Cohen and Berdugo, B. [2001] “ Speech enhancement for non stationary noise environtrnent, Signal Processing 81(11), 2403–2418. Crossref, Web of Science, Google Scholar
Cui, X., Chen, Z. and Yin, F. [2021] “ Multi-objective based multi-channel speech enhancement with BiLSTM network,” Appl. Acoustics. Crossref, Web of Science, Google Scholar
Dash, T. K., Mishra, S., Panda, G. and Satapathy, S. C. [2021] “ Detection of COVID-19 from speech signal using bio-inspired based cepstral features,” Pattern Recognit. 117, 107999. Crossref, Web of Science, Google Scholar
Darekar, R. V. and Dhande, A. P. [2019] “ Emotion recognition from speech signals using DCNN with hybrid GA-GWO algorithm,” Multim. Res. 2(4), 12–22. Google Scholar
Dionelis, N. and Brookes, M. [2018] “ Phase-aware single-channel speech enhancement with modulation-domain Kalman filtering,” IEEE/ACM Trans. Audio Speech Lang. Processing 26(5), 937–950, https://doi.org/10.1109/TASLP.2018.2800525. Crossref, Google Scholar
Garg, A. and Sahu, O. P. [2020] “ Enhancement of speech signal using diminished empirical mean curve decomposition-based adaptive Wiener filtering,” Pattern Anal. Appl. 23(1), 179–198. Crossref, Web of Science, Google Scholar
Gelderblom, F. B., Tronstad, T. V. and Viggen, E. M. [2019] “ Subjective evaluation of a noise-reduced training target for deep neural network-based speech enhancement,” IEEE/ACM Trans. Audio Speech Lang. Processing 27(3), 583–594, https://doi.org/10.1109/TASLP.2018.2882738. Crossref, Google Scholar
Kawase, T., Okamoto, M., Fukutomi, T. and Takahashi, Y. [2020] “ Speech enhancement parameter adjustment to maximize accuracy of automatic speech recognition,” IEEE Trans. Consum. Electronics 66(2), 125–133, https://doi.org/10.1109/TCE.2020.2986003. Crossref, Web of Science, Google Scholar
Kim, G., Lee, H., Kim, B., Oh, S. and Lee, S. [2019] “ Unpaired speech enhancement by acoustic and adversarial supervision for speech recognition,” IEEE Signal Processing Lett. 26(1), 159–163, https://doi.org/10.1109/LSP.2018.2880285. Crossref, Web of Science, Google Scholar
Kolbæk, M., Tan, Z. and Jensen, J. [2019] “ On the relationship between short-time objective intelligibility and short-time spectral-amplitude mean-square error for speech enhancement,” IEEE/ACM Trans. Audio Speech Lang. Processing 27(2), 283–295, https://doi.org/10.1109/TASLP.2018.2877909. Crossref, Google Scholar
Korvel, G., Kąkol, K., Kurasova, O. and Kostek, B. [2020] “ Evaluation of Lombard speech models in the context of speech in noise enhancement,” IEEE Access 8, 155156–155170, https://doi.org/10.1109/ACCESS.2020.3015421. Crossref, Web of Science, Google Scholar
Kuqi, B., Elezaj, E., Millaku, B., Dreshaj, A. and Hung, N. T. [2021] “ The impact of COVID-19 (SARS-CoV-2) in tourism industry: Evidence of Kosovo during Q1, Q2 and Q3 period of 2020,” J. Sustain. Finan. Invest., 1–12. Web of Science, Google Scholar
Lavanya, T., Nagarajan, T. and Vijayalakshmi, P. [2020] “ Multi-level single-channel speech enhancement using a unified framework for estimating magnitude and phase spectra,” IEEE/ACM Trans. Audio Speech Lang. Processing 28, 1315–1327, https://doi.org/10.1109/TASLP.2020.2986877. Crossref, Google Scholar
Li, R., Sun, X. and Zhao, F. [2020a] “ A multi-objective learning speech enhancement algorithm based on IRM post-processing with joint estimation of SCNN and TCNN,” Digit. Signal Processing. Web of Science, Google Scholar
Li, A., Yuan, M. and Li, X. [2020b] “ Speech enhancement using progressive learning-based convolutional recurrent neural network,” Appl. Acoustics. Crossref, Web of Science, Google Scholar
Martín-Doñas, J. M., Jensen, J., Tan, Z.-H., Gomez, A. M. and Peinado, A. M. [2020] “ Online multichannel speech enhancement based on recursive EM and DNN-based speech presence estimation,” IEEE/ACM Trans. Audio Speech Lang. Processing 28, 3080–3094, https://doi.org/10.1109/TASLP.2020.3036776. Crossref, Google Scholar
Mustaqeem, S. K. [2020] “ MLT-DNet: Speech emotion recognition using 1D dilated CNN based on multi-learning trick approach,” Exp. Syst. Appl.. Google Scholar
Nicolson, A. and Paliwal, K. K. [2020] “ Masked multi-head self-attention for causal speech enhancement,” Speech Communication. Crossref, Web of Science, Google Scholar
Pfeifenberger, L., Zöhrer, M. and Pernkopf, F. [2019] “ Eigenvector-based speech mask estimation for multi-channel speech enhancement,” IEEE/ACM Trans. Audio Speech Lang. Processing 27(12), 2162–2172, https://doi.org/10.1109/TASLP.2019.2941592. Crossref, Google Scholar
Plapous, C., Marro, C., Mauuary, L. and Scalart, P. [2004] “ A two-step noise reduction technique,” 2004 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Vol. 1, pp. I-289–1-92. Crossref, Google Scholar
Plapous, C., Marro, C. and Scalart, P. [2006] “ Improved signal-to-noise ratio estimation for speech enhancement,” IEEE Trans. ASLP 14(6), 2098–2108. Google Scholar
Podder, P., Khan, T. Z., Khan, M. H. and Muktadir Rahman, M. [2014] “ Comparative performance analysis of hamming, hanning and blackman window,” Int. J. Computer Appl. 96(18). Google Scholar
Rehr, R. and Gerkmann, T. [2018] “ On the importance of super-Gaussian speech priors for machine-learning based speech enhancement,” IEEE/ACM Trans. Audio Speech Lang. Processing 26(2), 357–366, https://doi.org/10.1109/TASLP.2017.2778151. Crossref, Google Scholar
Saleem, N., Khattak, M. I., Al-Hasan, M. and Qazi, A. B. [2020] “ On learning spectral masking for single channel speech enhancement using feedforward and recurrent neural networks,” IEEE Access 8, 160581–160595, https://doi.org/10.1109/ACCESS.2020.3021061. Crossref, Web of Science, Google Scholar
Shimada, K., Bando, Y., Mimura, M., Itoyama, K., Yoshii, K. and Kawahara, T. [2019] “ Unsupervised speech enhancement based on multichannel NMF-informed beamforming for noise-robust automatic speech recognition,” IEEE/ACM Trans. Audio Speech Lang. Processing 27(5), 960–971, https://doi.org/10.1109/TASLP.2019.2907015. Crossref, Google Scholar
Stahl, J. and Mowlaee, P. [2018] “ A pitch-synchronous simultaneous detection-estimation framework for speech enhancement,” IEEE/ACM Trans. Audio Speech Lang. Processing 26(2), 436–450, https://doi.org/10.1109/TASLP.2017.2779405. Crossref, Google Scholar
Sekiguchi, K., Bando, Y., Nugraha, A. A., Yoshii, K. and Kawahara, T. [2019] “ Semi-supervised multichannel speech enhancement with a deep speech prior,” IEEE/ACM Trans. Audio Speech Lang. Processing 27(12), 2197–2212, https://doi.org/10.1109/TASLP.2019.2944348. Crossref, Google Scholar
Valentini-Botinhao, C. and Yamagishi, J. [2018] “ Speech enhancement of noisy and reverberant speech for text-to-speech,” IEEE/ACM Trans. Audio Speech Lang. Processing 26(8), 1420–1433, https://doi.org/10.1109/TASLP.2018.2828980. Crossref, Google Scholar
Wang, P., Tan, K. and Wang, D. L. [2020] “ Bridging the gap between monaural speech enhancement and recognition with distortion-independent acoustic modeling,” IEEE/ACM Trans. Audio Speech Lang. Processing 28, 39–48, https://doi.org/10.1109/TASLP.2019.2946789. Crossref, Google Scholar
Wood, S. U. N., Stahl, J. K. W. and Mowlaee, P. [2019] “ Binaural codebook-based speech enhancement with atomic speech presence probability,” IEEE/ACM Trans. Audio Speech Lang. Processing 27(12), 2150–2161, https://doi.org/10.1109/TASLP.2019.2937174. Crossref, Google Scholar
Xue, W., Moore, A. H., Brookes, M. and Naylor, P. A. [2021] “ Speech enhancement based on modulation-domain parametric multichannel Kalman filtering,” IEEE/ACM Trans. Audio Speech Lang. Processing 29, 393–405, https://doi.org/10.1109/TASLP.2020.3040850. Crossref, Google Scholar
Yadava, G., Thimmaraja, B. G. N. and Jayanna, H. S. [2021] Int. J. Speech Technol. 24. Google Scholar
Yu, H., Zhu, W.-P., Ouyang, Z. and Champagne, B. [2020] “ A hybrid speech enhancement system with DNN based speech reconstruction and Kalman filtering,” Multim. Tools Appl. 79. Crossref, Web of Science, Google Scholar