Deep LearningNo Access

Self-Labeling Learning Ensemble via Deep Recurrent Neural Network and Self-Representation for Speech Emotion Recognition

College of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, P. R. China

E-mail Address: csyanncui@njts.edu.cn

Corresponding author.

Search for more papers by this author

Xiaoyan Jiang

https://orcid.org/0000-0001-8620-6488

College of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, P. R. China

E-mail Address: jxy@njts.edu.cn

Search for more papers by this author

, and

Yue Dai

https://orcid.org/0000-0003-0531-030X

College of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing 210038, P. R. China

E-mail Address: 250032@njts.edu.cn.com

Search for more papers by this author

https://doi.org/10.1142/S0218001424520177Cited by:0 (Source: Crossref)

Abstract

Speech emotion recognition (SER) methods rely on frames to analyze the speech data. However, the existing methods typically divide a speech sample into smaller speech frames and label them with a single emotional tag, which fails to consider the possibility of multiple emotion tags coexisting within a speech sample. To deal with this limitation, we present a novel approach called self-labeling learning ensemble via DRNN and self-representation (En-DRNN-SR) for SER. This method automatically segments speech sample into speech frames, and then the deep recurrent neural network (DRNN) is applied to learn the deep features, and next the self-representation is built to get a relational degree matrix, finally the speech frames is divided into three parts using a relational degree matrix: the key emotional frames, the compatible emotional frames and the noise frames. The emotion tags of the compatible emotional frames are adaptive cyclic learned based on the key emotion frames vias the relational degree matrix, while also checking the emotion tags associated with the key compatible frames. Additionally, we introduce a new self-labeling criterion based on fuzzy membership degree for SER. To evaluate the feasibility and effectiveness of the proposed En-DRNN-SR, we conducted extensive experiments on IEMOCAP, EMODB, and SAVEE database, the proposed En-DRNN-SR obtains 69.13%, 82.83%, and 52.31% results on IEMOCAP, EMODB, and SAVEE database, which outperformed all competing algorithms. The experimental results clearly demonstrate that the proposed approach outperforms state-of-the-art SER methods, achieving superior performance on feature learning and classification.

Keywords:

References

1. S. M. Alarcao and M. J. Fonseca, Emotions recognition using EEG signals: A survey, IEEE Trans. Affect. Comput. 10 (2018) 374–393. Crossref, Google Scholar
2. U. Avci, A pattern mining approach for improving speech emotion recognition, Int. J. Pattern Recognit. Artif. Intell. 36(14) (2022) 2250045. Link, Web of Science, Google Scholar
3. J. Chen, D. Jiang, Y. Zhang and P. Zhang, Emotion recognition from spatiotemporal EEG representations with hybrid convolutional recurrent neural networks via wearable multi-channel headset, Comput. Commun. 154 (2020) 58–65. Crossref, Web of Science, Google Scholar
4. Z. Chen, J. Li, H. Liu, X. Wang, H. Wang and Q. Zheng, Learning multi-scale features for speech emotion recognition with connection attention mechanism, Expert Syst. Appl. 214 (2023) 118943. Crossref, Web of Science, Google Scholar
5. Y. Cui, J. Jiang, Z. Lai, W. Wong and Z. Hu, An integrated optimisation algorithm for feature extraction, dictionary learning and classification, Neurocomputing 275 (2018) 2740–2751. Crossref, Web of Science, Google Scholar
6. H. Deng, L. Zhang and X. Shu, Feature memory-based deep recurrent neural network for language modeling, Appl. Soft Comput. 68 (2018) 432–446. Crossref, Web of Science, Google Scholar
7. E. H. Houssein, A. Hammad and A. A. Ali, Human emotion recognition from EEG-based brain–computer interface using machine learning: A comprehensive review, Neural Comput. Appl. 34 (2022) 12527–12557. Crossref, Web of Science, Google Scholar
8. Q. Huang, C. Huang and X. Wang, Facial expression recognition with grid-wise attention and visual transformer, Inf. Sci. 580 (2021) 35–54. Crossref, Web of Science, Google Scholar
9. L. Huang, F. Xie and J. Zhao, Human emotion recognition based on face and facial expression detection using deep belief network under complicated backgrounds, Int. J. Pattern Recognit. Artif. Intell. 34(14) (2020) 2056010. Link, Web of Science, Google Scholar
10. K. Intissar, E. Ridha, S. Raimondo and Z. Mourad, Speech emotion recognition approaches in human computer interaction, Comput. J. 7 (2021) 7–16. Google Scholar
11. M. Kalimuthu, S. Sreethar, R. Murugesan and N. Nandhagopal, Multi-class facial emotion recognition using hybrid dense squeeze network, Int. J. Pattern Recognit. Artif. Intell. 37(5) (2023) 2356005. Link, Google Scholar
12. J. Keller, M. Gray and J. Givens, A fuzzy k-nearest neighbor algorithm, IEEE Trans. Knowl. Data Eng. 21 (2009) 1263–1284. Google Scholar
13. K. Krishna, N. Sainath and A. Posonia, Speech emotion recognition using machine learning, in Proc. 2022 6th Int. Conf. Computing Methodologies and Communication (ICCMC) (IEEE, 2022), pp. 1014–1018. Crossref, Google Scholar
14. J. Lei, X. Zhu and Y. Wang, BAT: Block and token self-attention for speech emotion recognition, Neural Netw. 156 (2022) 67–80. Crossref, Web of Science, Google Scholar
15. Y. Li, C. Papayiannis, V. Rozgic, E. Shriberg and C. Wang, Confidence estimation for speech emotion recognition based on the relationship between emotion categories and primitives, in Proc. 2022 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2022), pp. 7352–7356. Crossref, Google Scholar
16. S. Li, X. Xing, W. Fan, B. Cai, P. Fordson and X. Xu, Spatiotemporal and frequential cascaded attention networks for speech emotion recognition, Neurocomputing 448 (2021) 238–248. Crossref, Web of Science, Google Scholar
17. J. Li, Z. Zhang and H. He, Hierarchical convolutional neural networks for EEG-based emotion recognition, Cogn. Comput. 10 (2018) 368–380. Crossref, Web of Science, Google Scholar
18. K. Manohar and E. Logashanmugam, Hybrid deep learning with optimal feature selection for speech emotion recognition using improved meta-heuristic algorithm, Knowl.-Based Syst. 246 (2022) 108659. Crossref, Web of Science, Google Scholar
19. R. Mowla, R. Cano, K. Dhuyvetter and D. Thompson, Affective brain–computer interfaces: Choosing a meaningful performance measuring metric, Comput. Biol. Med. 126 (2020) 104001. Crossref, Web of Science, Google Scholar
20. R. Nagase, T. Fukumori and Y. Yamashita, Speech emotion recognition using label smoothing based on neutral and anger characteristics, in Proc. 2022 IEEE 4th Global Conf. Life Sciences and Technologies (LifeTech) (IEEE, 2022), pp. 626–627. Crossref, Google Scholar
21. M. A. Pastor, D. Ribas, A. Ortega, A. Miguel and E. Lleida, Cross-corpus training strategy for speech emotion recognition using self-supervised representations, Appl. Sci. 13 (2023) 9062. Crossref, Google Scholar
22. S. Peng, K. Chen, T. Tian and J. Chen, An autoencoder-based feature level fusion for speech emotion recognition, Dig. Commun. Netw. (2022), https://doi.org/10.1016/j.dcan.2022.10.018. Google Scholar
23. N. Perveen, D. Roy and C. K. Mohan, Facial expression recognition in videos using dynamic kernels, IEEE Trans. Image Process. 29 (2020) 8316–8325. Crossref, Web of Science, Google Scholar
24. S. Prabhav, S. Ridam, K. Rana and K. Vineet, A multimodal hierarchical approach to speech emotion recognition from audio and text, Knowl.-Based Syst. 229 (2021) 107316. Crossref, Google Scholar
25. Md. Rahman, F. Hossain and R. Ahmmed, Employing PCA and t-statistical approach for feature extraction and classification of emotion from multichannel EEG signal, Egypt. Inform. J. 34 (2020) 23–35. Crossref, Google Scholar
26. C. Rajdeep, M. Saptarshi, R. Sherratt, R. Halder, T. Maitra and D. Giri, Real-time speech emotion analysis for smart home assistants, IEEE Trans. Consum. Electron. 67 (2021) 68–76. Crossref, Google Scholar
27. S. Singkul and K. Woraratpanya, Vector learning representation for generalized speech emotion recognition, Heliyon 8 (2022) e09196. Crossref, Web of Science, Google Scholar
28. T. Song, W. Zheng, P. Song and Z. Cui, EEG Emotion recognition using dynamical graph convolutional neural networks, IEEE Trans. Affect. Comput. 11 (2020) 532–541. Crossref, Web of Science, Google Scholar
29. X. Wang, C. Jin, W. Liu, M. Hu and F. Ren, Feature fusion of HOG and WLD for facial expression recognition, in Proc. 2013 IEEE/SICE Int. Symp. System Integration (IEEE, 2013), pp. 227–232. Crossref, Google Scholar
30. F. Wang and X. Shen, Research on speech emotion recognition based on teager energy operator coefficients and inverted MFCC feature fusion, Electronics 12 (2023) 3599. Crossref, Web of Science, Google Scholar
31. Y. Wang, J. Wang, Y. Li, M. Yu, Y. Zhou and B. Zhang, Facial expression recognition with fused handcraft features based on pixel difference local directional number pattern, J. Intell. Fuzzy Syst.: Appl. Eng. Technol. 1 (2021) 113–123. Google Scholar
32. C. Wei, L. Chen, Z. Song, X. Lou and D. Li, EEG-based emotion recognition using simple recurrent units network and ensemble learning, Biomed. Signal Process. 58 (2020) 101756. Crossref, Google Scholar
33. G. Wen, H. Liao, H. Li and P. Wen, Self-labeling with feature transfer for speech emotion recognition, Knowl.-Based Syst. 254 (2022) 109589. Crossref, Web of Science, Google Scholar
34. W. Xie, X. Jia, L. Shen and M. Yang, Sparse deep feature learning for facial expression recognition, Pattern Recognit. 96 (2019) 106966. Crossref, Web of Science, Google Scholar
35. X. Xu, D. Li, Y. Zhou and Z. Wanga, Multi-type features separating fusion learning for speech emotion recognition, Appl. Soft Comput. 130 (2022) 109648. Crossref, Web of Science, Google Scholar
36. F. Yang, X. Zhao, W. Jiang, P. Gao and G. Liu, Multi-method fusion of cross-subject emotion recognition based on high-dimensional EEG features, IEEE Trans. Affect. Comput. 13 (2019) 53. Google Scholar
37. S. Zhang, C. Tang and C. Guan, Visual-to-EEG cross-modal knowledge distillation for continuous emotion recognition, Pattern Recognit. 130 (2022) 108833. Crossref, Web of Science, Google Scholar
38. F. Zhang, T. Zhang, Q. Mao and C. Xu, Geometry guided pose-invariant facial expression recognition, IEEE Trans. Image Process. 29 (2020) 4445–4460. Crossref, Web of Science, Google Scholar
39. W. Zheng, Multichannel EEG-based emotion recognition via group sparse canonical correlation analysis, IEEE Trans. Cogn. Dev. Syst. 9 (2017) 281–290. Crossref, Web of Science, Google Scholar
40. W. Zheng and B. Lu, Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks, IEEE Trans. Auton. Ment. Dev. 7 (2015) 162–175. Crossref, Google Scholar
41. M. Zhong, Q. Yang, Y. Liu, B. Zhen, F. Zhao and B. Xie, EEG emotion recognition based on TQWT-features and hybrid convolutional recurrent neural network, Biomed. Signal Process. Control 79 (2023) 474–484. Crossref, Web of Science, Google Scholar
42. F. Zhou, S. Kong, C. Fowlkes, T. Chen and B. Lei, Fine-grained facial expression analysis using dimensional emotion model, Neurocomputing 392 (2020) 38–49. Crossref, Web of Science, Google Scholar
43. Y. Zhou, X. Liang, Y. Gu, Y. Yin and L. Yao, Multi-classifier interactive learning for ambiguous speech emotion recognition, IEEE/ACM Trans. Audio Speech Lang. Process. 30 (2022) 695–705. Crossref, Google Scholar