Open Access

TOWARD ROBUST SPEECH EMOTION RECOGNITION AND CLASSIFICATION USING NATURAL LANGUAGE PROCESSING WITH DEEP LEARNING MODEL

Department of Computer Science, Applied College, Northern Border University, Arar, Saudi Arabia

Department of Language Preparation, Arabic Language Teaching Institute, Princess Nourah Bint Abdulrahman University, P. O. Box 84428, Riyadh 11671, Saudi Arabia

Search for more papers by this author

MAJDY M. ELTAHIR

https://orcid.org/0000-0002-1810-4372

Department of Information Systems, Applied College at Mahayil, King Khalid University, Abha, Saudi Arabia

Search for more papers by this author

MUHAMMAD SWAILEH A. ALZAIDI

https://orcid.org/0000-0002-1276-6316

Department of English Language, College of Language Sciences, King Saud University, P. O. Box 145111, Riyadh, Saudi Arabia

Search for more papers by this author

AYMAN AHMAD ALGHAMDI

https://orcid.org/0000-0003-2295-4748

Department of Arabic Teaching, Arabic Language Institute, Umm Al-qura University, Mecca, Saudi Arabia

E-mail Address: aamansoori@uqu.edu.sa

Corresponding author.

Search for more papers by this author

, and

AHMED MAHMUD

https://orcid.org/0009-0001-5904-7482

Research Center, Future University in Egypt, New Cairo 11835, Egypt

Search for more papers by this author

https://doi.org/10.1142/S0218348X25400225Cited by:0 (Source: Crossref)

Abstract

Speech Emotion Recognition (SER) plays a significant role in human–machine interaction applications. Over the last decade, many SER systems have been anticipated. However, the performance of the SER system remains a challenge owing to the noise, high system complexity and ineffective feature discrimination. SER is challenging and vital, and feature extraction is critical in SER performance. Deep Learning (DL)-based techniques emerge as proficient solutions for SER due to their competence in learning unlabeled data, superior capability of feature representation, capability to handle larger datasets and ability to handle complex features. Different DL techniques, like Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), Deep Neural Network (DNN) and so on, are successfully presented for automated SER. The study proposes a Robust SER and Classification using the Natural Language Processing with DL (RSERC-NLPDL) model. The presented RSERC-NLPDL technique intends to identify the emotions in the speech signals. In the RSERC-NLPDL technique, pre-processing is initially performed to transform the input speech signal into a valid format. Besides, the RSERC-NLPDL technique extracts a set of features comprising Mel-Frequency Cepstral Coefficients (MFCCs), Zero-Crossing Rate (ZCR), Harmonic-to-Noise Rate (HNR) and Teager Energy Operator (TEO). Next, selecting features can be carried out using Fractal Seagull Optimization Algorithm (FSOA). The Temporal Convolutional Autoencoder (TCAE) model is applied to identify speech emotions, and its hyperparameters are selected using fractal Sand Cat Swarm Optimization (SCSO) algorithm. The simulation analysis of the RSERC-NLPDL method is tested using a speech database. The investigational analysis of the RSERC-NLPDL technique showed superior accuracy outcomes of 94.32% and 95.25% under EMODB and RAVDESS datasets over other models in distinct measures.

Keywords:

Online Ready

Metrics

Downloaded 31 times

History

Received 25 May 2024

Accepted 20 July 2024

Published: February 17, 2025

Information

This is an Open Access article in the “Special Issue on Application of Brain-like Computing to the Modelling and Simulation of Complex Systems”, edited by Shadi Mahmoud Faleh AlZu’bi (Al-Zaytoonah University of Jordan, Jordan), Maysam Abbod (Brunel University London, UK) & Ashraf Darwish (Helwan University, Cairo, Egypt), published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND) License, which permits use, distribution and reproduction, provided that the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

Keywords

PDF download

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

TOWARD ROBUST SPEECH EMOTION RECOGNITION AND CLASSIFICATION USING NATURAL LANGUAGE PROCESSING WITH DEEP LEARNING MODEL

Abstract

Recommended