Research PaperNo Access

Enhancing Speech Quality Using Artificial Bandwidth Expansion with Deep Shallow Convolution Neural Network Framework

N. Radha

Sri Sivasubramaniya Nadar College of Engineering, Chennai, India

E-mail Address: radhanitssnce@gmail.com

Corresponding author.

Search for more papers by this author

R. B. Jananie

Sri Sivasubramaniya Nadar College of Engineering, Chennai, India

Search for more papers by this author

, and

A. Anto Silviya

Sri Sivasubramaniya Nadar College of Engineering, Chennai, India

Search for more papers by this author

https://doi.org/10.1142/S0219477522500080Cited by:3 (Source: Crossref)

Abstract

Speech processing is an important application area of digital signal processing that helps examine and analyze the speech signal. In this processing, speech enhancement is an essential factor because it improves the quality of the signal that helps resolve the communication challenges. Different speech enhancement algorithms are utilized in the research field, but limited processing capabilities, maximum microphone distance, and voice-first I.O. interfaces create the computation complexity. In this paper, speech enhancement is done in two steps. In an initial step, spectral subtraction method is applied to LJ Speech dataset. In the first stage, noise spectrum is estimated during pauses and it is subtracted from the noisy speech signal to obtain the clean speech signal. However, spectral subtraction method still introduces artificial noise and narrow-band noise in the spectrum. Hence, artificial bandwidth expansion with a deep shallow convolution neural network (ABE-DSCNN) is implemented as a second stage in the paper. Further, developed system is compared with conventional enhancement approaches such as deep learning network (DNN), neural beam forming (NB) and generative adversarial network (GAN). The experimental results show that an ABS-DSCNN provides 4% increase of PSEQ and error rate improved by 40% to 56% with respect to the other existing algorithms for 1000 speech samples. Hence, the paper concludes that ABE-DSCNN approach effectively improves the speech quality.

Communicated by Hongjing Liang

Keywords:

References

1. D. O’Shaughnessy , Speech Communications: Human and Machine, 2nd edn. (University Press (I) Pvt. Ltd., India, 2007). Google Scholar
2. S. Boll , Suppression of acoustic noise in speech using spectral subtraction”, IEEE Transaction on acoustics, Speech Signal Process. 27(2) (1979) 113–20. Crossref, Google Scholar
3. P. C. Loizou and G. Kim , Reasons why current speech-enhancement algorithms do not improve speech intelligibility and suggested solutions, IEEE Trans. Audio Speech Lang. Process. 19(1) (2011) 47–56. Crossref, Google Scholar
4. D. Van Compernolle , Noise adaptation in a hidden Markov model speech recognition system, Comput. Speech Lang. 3(2) (1989) 151–167. Crossref, Google Scholar
5. V. Grancharov, V. Grancharov, J. Samuelsson and W. B. Kleijn , Improved Kalman filtering for speech enhancement, IEEE Int. Conf. Acoustics, Speech Signal Process, pp. 22–23 (Philadelphia, PA, USA, 2005). Crossref, Google Scholar
6. Y. Ephraim and D. Malah , Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Trans. Acoust. Speech Signal Process. 33(2) (1985) 443–445. Crossref, Google Scholar
7. J. S. Lim and A. V. Oppenheim , Enhancement and bandwidth compression of noisy speech, Proc. IEEE 67(12) (1979) 1586–1604. Crossref, Web of Science, Google Scholar
8. M. Dendrinos, S. Bakamidis and G. Carayannis , Speech enhancement from noise: A regenerative approach, Speech Commun. 10(2) (1991) 45–57. Crossref, Web of Science, Google Scholar
9. Y. Ephraim and H. V. Trees , A signal subspace approach for speech enhancement, IEEE Trans. Speech Audio Process. 3(4) (1995) 251–266. Crossref, Google Scholar
10. K. Wojcicki, M. Milacic, A. Stark, J. Lyons and K. K. Paliwal , “Exploiting conjugate symmetry of the short-time Fourier spectrum for speech enhancement”, IEEE Signal Process. Lett. 15 (2008) 461–464. Crossref, Web of Science, Google Scholar
11. A. Stark, K. Wojcicki, J. Lyons and K. K. Paliwal , Noise driven short-time phase spectrum compensation procedure for speech enhancement, Proc. Interspeech, pp. 549–553, 2008. Crossref, Google Scholar
12. M. Awad and R. Khanna , Machine learning, Efficient Learning Machines (Apress, Berkeley, CA, 2015). Crossref, Google Scholar
13. P. Kunche and N. Manikanthababu, Fractional cosine transform based single channel speech enhancement techniques, Fractional Fourier Transform Techniques for Speech Enhancement, Briefs in Speech Technology (Studies in Speech Signal Processing, Natural Language Understanding, and Machine Learning, 2020). Google Scholar
14. A. E. Hassanien, G. Schaefer and A. Darwish , Computational intelligence in speech and audio processing: Recent advances, Soft Computing in Industrial Applications. Advances in Intelligent and Soft Computing, Vol. 75 (Springer, 2010). Crossref, Google Scholar
15. D. Baby, T. Virtanen, J. F. Gemmeke and H. Van Hamme , Coupled dictionaries for exemplar-based speech enhancement and automatic speech recognition, IEEE/ACM Trans. Audio Speech Language Process. 23(11) (2015) 1788–1799. Crossref, Google Scholar
16. J. Li, L. Deng, R. Haeb-Umbac h and Y. Gong , Robust Automatic Speech Recognition: A Bridge to Practical Applications, 1st edn. (Academic, Orlando, 2015). Google Scholar
17. D. S. Williamson and D. L. Wang , Time-frequency masking in the complex domain for speech dereverberation and denoising, IEEE ACM Trans. Audio Speech Language Process. 25(7) (2017) 1492–1501. Crossref, Web of Science, Google Scholar
18. S. Shanmugavel, M. A. Bhagyaveni and R. Kalidoss , Cognitive Radio: An Enabler for Internet of Things (River Publishers, 2017). Google Scholar
19. D. Hepsiba and J. Justin , Role of deep neural network in speech enhancement: A review, Commun. Comput. Inform. Sci. 890 (2019). Crossref, Google Scholar
20. R. Ram and M. N. Mohanty , Deep neural network based speech enhancement, Adv. Intell. Syst. Comput. 768 (2019). Crossref, Google Scholar
21. S. Tomassetti, L. Gabrielli, E. Principi, D. Ferretti and S. Squartini , Neural beamforming for speech enhancement: preliminary results, Smart Innov. Syst. Technol. 102 (2019). Crossref, Google Scholar
22. S. Dubey, A. K. Singh and M. K. Singh , speech enhancement using adaptive mean median deviation and E.M.D. technique, IEEE Int. Conf. Signals and Systems, pp. 34–39, 2019. Google Scholar
23. M. H. Soni, N. Shah and H. A. Patil , Time-frequency masking-based speech enhancement using generative adversarial network, IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp. 5039–5043, 2018. Crossref, Google Scholar
24. W. Jose, AMRConvNet: AMR-Coded Speech Enhancement Using Convolutional Neural Networks, arXiv: 2008.10233, 2020. Google Scholar
25. M. Delcroix, T. Yoshioka and N. Ito , Multichannel speech enhancement approaches to dnn-based far-field speech recognition, New Era for Robust Speech Recognition, 2017. Crossref, Google Scholar
26. M. SidiYakoub, S. Selouani, B. Zaidi and B. Asma , Improving dysarthria speech recognition using empirical mode decomposition and convolutional neural network, EURASIP J. Audio, Speech Music Process. (2020) 1–7. Crossref, Web of Science, Google Scholar
27. X. Cui, C. Zhe and Y. Fuliang , Speech enhancement based on simple recurrent unit network, Appl. Acoust. 157 (2020). Crossref, Web of Science, Google Scholar
28. A. Altan, S. Karasu and S. Bekiros , Digital currency forecasting with chaotic meta-heuristic bio-inspired signal processing techniques, Chaos, Solitons Fractals 126 (2019) 325–336. Crossref, Web of Science, Google Scholar
29. S. Karasu, A. Altan, S. Bekiros and W. Ahmad , A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series, Energy 212 (2020) 1–12. Crossref, Web of Science, Google Scholar
30. A. Altan and S. Karasu , Recognition of COVID-19 disease from X-ray images by hybrid model consisting of 2D curvelet transform, chaotic salp swarm algorithm and deep learning technique, Chaos, Solitons Fractals 140 (2020) 1–27. Crossref, Web of Science, Google Scholar
31. A. Altana, S. Karasua and E. Zio , A new hybrid model for wind speed forecasting combining long short-term memory neural network, decomposition methods and grey wolf optimizer, Appl. Soft Comput. 100 (2021) 1–41. Web of Science, Google Scholar
32. A. Altan and S. Karasu , “The effect of Kernel values in support vector machine to forecasting performance of financial time series and cognitive decision making”, J. Cogn. Syst. 4(1) (2019). Google Scholar
33. R. Karthipan, K. S. Vishvaksenan, R. Kalidoss and A. Krishan , Performance of cognitive radio-based MC-DS-CDMA system for downlink communication, Int. Conf. Wireless Communications, Signal Processing and Networking, pp. 401–404, 2016. Crossref, Google Scholar
34. V. Vijayarangan, R. Kalidoss and R. Sukanesh , Crest factor reduction in multicarrier transmission by low crest mapping, First Int. Symp. Pervasive Computing and Applications, pp. 758–763, 2006. Crossref, Google Scholar
35. B. Partibane and R. Kalidoss , Compact broadband slot-based MIMO antenna array for vehicular environment, Microwave Optical Technol. Lett. 62(5) (2020) 2024–2032. Crossref, Web of Science, Google Scholar
36. S. P. Damodaran, K. S. Vishvaksenan and R. Kalidoss , Optimized and low-complexity power allocation and beamforming with full duplex in massive MIMO and small-cell networks, J. Supercomput. 75(12) (2019) 7979–7993. Crossref, Web of Science, Google Scholar
37. B. Partibane, M. G. N. Alsath and R. Kalidoss , Design of a bandwidth enhanced hybrid slot loop antenna for GSM/UWB standards, Circuit World 43(3) (2017) 105–110. Crossref, Web of Science, Google Scholar
38. https://keithito.com/LJ-Speech-Dataset/. Google Scholar
39. A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez and J. G. Rodríguez, A review on deep learning techniques applied to semantic segmentation, arXiv: 1704.06857, 2017. Available at http://arxiv.org/abs/1704.06857. Google Scholar
40. R. G. Lorenz and S. P. Boyd , Robust minimum variance beamforming, IEEE Trans. Signal Process. 53(5) (2005) 1684–1696. Crossref, Web of Science, Google Scholar