ArticleOpen Access

MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE WITH FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM

Department of Language Preparation, Arabic Language Teaching Institute, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia

Search for more papers by this author

MAJDY M. ELTAHIR

https://orcid.org/0000-0002-1810-4372

Department of Information Systems, Applied College at Mahayil, King Khalid University, Asir, Abha, Saudi Arabia

Search for more papers by this author

SOMIA A. ASKLANY

https://orcid.org/0000-0002-1590-9845

Department of Computer Science and Information Technology, Faculty of Sciences and Arts in Turaif, Northern Border University, Arar 91431, Saudi Arabia

Search for more papers by this author

SAMI A. AL GHAMDI

https://orcid.org/0000-0002-7416-5153

Department of Computer Science, Faculty of Computing and Information, Al-Baha University Alaqiq, Saudi Arabia

Search for more papers by this author

ABDULLAH ALMUHAIMEED

https://orcid.org/0000-0002-1155-9382

Digital Health Institute, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia

Search for more papers by this author

FUHID ALANAZI

https://orcid.org/0000-0001-5728-3855

Department of Information Systems, Faculty of Computer and Information Systems, Islamic University of Madinah, Medina 42351, Saudi Arabia

Search for more papers by this author

ABDELMONEIM ALI MOHAMED

https://orcid.org/0000-0001-6271-2669

Department of Information Systems, College of Computer and Information Sciences, Majmaah University, Al-Majmaah 11952, Saudi Arabia

Search for more papers by this author

, and

MOHAMMED RIZWANULLAH

https://orcid.org/0000-0002-6951-8823

Department of Computer and Self Development, Preparatory Year Deanship Prince Sattam bin Abdulaziz University, Al-Kharj, Saudi Arabia

E-mail Address: r.mohammed@psau.edu.sa

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S0218348X25400547Cited by:0 (Source: Crossref)

This article is part of the issue:

Special Issue on Application of Brain-Like Computing to the Modeling and Simulation of Complex Systems — Part I
Leading Guest Editor: Shadi Mahmoud Faleh AlZu’bi, Guest Editors: Maysam Abbod and Ashraf Darwish

Abstract

Spoken Language Identification (SLID) is the problem of categorizing the language spoken by a speaker in the audio clips. SLID is valuable in multi-language speech recognition systems, personalized voice assistants, and automated speech translation systems in call centers to automatically route calls to the language operator. A primary challenge is the language detection from audio with different noise levels and sampling rates, accurately and with a short delay. A further problem is to differentiate between short-duration languages. Previous research works have applied SLID’s lexical, phonetic, phonotactic, and prosodic features. Spoken language detection using deep learning (DL) usually includes training RNN or CNN approaches on audio features such as spectrograms or MFCCs to categorize the language spoken in audio samples. Pioneering methodologies, such as CNN–RNN transformers or hybrids, can capture the spatial and temporal features for better performance. This paper presents a Multi-Class Spoken Language Detection using Artificial Intelligence with Fractal Al-Biruni Earth Radius Optimization (MCSLD-AIBER) technique. The MCSLD-AIBER technique mainly aims to identify the various classes of spoken languages. In the MCSLD-AIBER technique, the Constant- $Q$ Transform (CQT) approach is applied to transform the speech signals. Additionally, the MCSLD-AIBER technique employs Inception with a Residual Network model for the feature extraction process. Moreover, the hyperparameters can be adjusted using the BER approach. A long short-term memory (LSTM) network can be utilized to identify multiple spoken languages. A set of experiments were involved to illustrate the efficient performance of the MCSLD-AIBER technique. The simulation outcomes indicated that the MCSLD-AIBER method performs optimally over other models.

Keywords:

References

1. M. A. A. Albadr and S. Tiun, Spoken language identification based on particle swarm optimization–extreme learning machine approach, Circuits Syst. Signal Process. 39(9) (2020) 4596–4622. Crossref, Web of Science, Google Scholar
2. A. Garain, P. K. Singh and R. Sarkar, FuzzyGCP: A deep learning architecture for automatic spoken language identification from speech signals, Expert Syst. Appl. 168 (2021) 114416. Crossref, Web of Science, Google Scholar
3. I. Kukanov, T. N. Trong, V. Hautamäki, S. M. Siniscalchi, V. M. Salerno and K. A. Lee, Maximal figure-of-merit framework to detect multi-label phonetic features for spoken language recognition, IEEE/ACM Trans. Audio Speech Lang. Process. 28 (2020) 682–695. Crossref, Google Scholar
4. M. Biswas, S. Rahaman, S. Kundu, P. K. Singh and R. Sarkar, Spoken language identification of Indian languages using MFCC features, in Machine Learning for Intelligent Multimedia Analytics: Techniques and Applications, Studies in Big Data, Vol. 82 (Springer, Singapore, 2021), pp. 249–272. Crossref, Google Scholar
5. J. Monteiro, M. J. Alam and T. Falk, On the performance of time-pooling strategies for end-to-end spoken language identification, in Proc. Twelfth Language Resources and Evaluation Conf. (European Language Resources Association, 2020), pp. 3566–3572. Google Scholar
6. A. I. Abdurrahman and A. Zahra, Spoken language identification using i-vectors, x-vectors, PLDA and logistic regression, Bull. Electr. Eng. Inform. 10(4) (2021) 2237–2244. Crossref, Google Scholar
7. A. Bakshi and S. K. Kopparapu, Improving Indian spoken-language identification by feature selection in duration mismatch framework, SN Comput. Sci. 2(6) (2021) 442. Crossref, Google Scholar
8. J. Ao, R. Wang, L. Zhou, C. Wang, S. Ren, Y. Wu, S. Liu, T. Ko, Q. Li, Y. Zhang and Z. Wei, SpeechT5: Unified-modal encoder-decoder pre-training for spoken language processing, preprint, arXiv:2110.07205 [eess.AS] (2021). Google Scholar
9. Y. A. Chung, C. Zhu and M. Zeng, SPLAT: Speech-language joint pre-training for spoken language understanding, preprint, arXiv:2010.02295 [cs.CL] (2020). Google Scholar
10. P. Bam, S. Degadwala, R. Upadhyay and D. Vyas, Spoken language recognization based on features and classification methods: A review, in Proc. 2022 Second Int. Conf. Artificial Intelligence and Smart Energy (ICAIS) (IEEE, 2022), pp. 868–873. Crossref, Google Scholar
11. S. Kaur, S. Singh and S. Kaushal, Deep learning-based approaches for abusive content detection and classification for multi-class online user-generated data, Int. J. Cogn. Comput. Eng. 5 (2024) 104–122. Google Scholar
12. M. U. Rahman and C. Direkoglu, Multi-class classification of voice disorders using deep transfer learning, in ICCIDA 2023: Computing, Intelligence and Data Analytics, Studies in Computational Intelligence, Vol. 1145 (Springer, Cham, 2024), pp. 262–270. Crossref, Google Scholar
13. M. Swain, B. Maji, P. Kabisatpathy and A. Routray, A DCRNN-based ensemble classifier for speech emotion recognition in Odia language, Complex Intell. Syst. 8(5) (2022) 4237–4249. Crossref, Web of Science, Google Scholar
14. A. A. Alemu, M. D. Melese and A. O. Salau, Ethio-Semitic language identification using convolutional neural networks with data augmentation, Multimed. Tools Appl. 83(12) (2024) 34499–34514. Crossref, Web of Science, Google Scholar
15. B. Aytan and C. O. Şakar, Deep learning-based Turkish spelling error detection with a multi-class false positive reduction model, Turk. J. Electr. Eng. Comput. Sci. 31(3) (2023) 581–595. Crossref, Web of Science, Google Scholar
16. S. Biradar, S. Saumya and A. Chauhan, Faux Hate: unravelling the web of fake narratives in spreading hateful stories: a multi-label and multi-class dataset in cross-lingual Hindi-English code-mixed text, Lang. Resour. Eval. (2024), https://doi.org/10.1007/s10579-024-09732-0. Crossref, Web of Science, Google Scholar
17. K. Radha and M. Bansal, Closed-set automatic speaker identification using multi-scale recurrent networks in non-native children, Int. J. Inf. Technol. 15(3) (2023) 1375–1385. Google Scholar
18. Y. Gu, X. Zhang, L. Xue and Z. Wu, Multi-scale sub-band constant-q transform discriminator for high-fidelity vocoder, in Proc. 2024 IEEE Int. Conf. Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2024), pp. 10616–10620. Crossref, Google Scholar
19. M. Neshat, M. Ahmed, H. Askari, M. Thilakaratne and S. Mirjalili, Hybrid inception architecture with residual connection: Fine-tuned Inception-ResNet deep learning model for lung inflammation diagnosis from chest radiographs, Procedia Comput. Sci. 235 (2024) 1841–1850. Crossref, Google Scholar
20. H. Myriam, A. A. Abdelhamid, E. S. M. El-Kenawy, A. Ibrahim, M. M. Eid, M. M. Jamjoom and D. S. Khafaga, Advanced meta-heuristic algorithm based on Particle Swarm and Al-Biruni Earth Radius optimization methods for oral cancer detection, IEEE Access 11 (2023) 23681–23700. Crossref, Web of Science, Google Scholar
21. K. Zhang, F. J. Badesa, Y. Liu and M. Ferre Pérez, Dual stream long short-term memory feature fusion classifier for surface electromyography gesture recognition, Sensors 24(11) (2024) 3631. Crossref, Web of Science, Google Scholar
22. Kaggle, Spoken language identification (2018), https://www.kaggle.com/datasets/topono wicz/spoken-language-identification. Google Scholar

Vol. 32, No. 09n10

Metrics

Downloaded 96 times

History

Received 22 June 2024

Accepted 24 August 2024

Published: December 16, 2024

Information

This is an Open Access article in the “Special Issue on Application of Brain-like Computing to the Modeling and Simulation of Complex Systems — Part I”, edited by Shadi Mahmoud Faleh AlZu’bi (Al-Zaytoonah University of Jordan, Jordan), Maysam Abbod (Brunel University London, UK) & Ashraf Darwish (Helwan University, Cairo, Egypt), published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND) License, which permits use, distribution and reproduction, provided that the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.

Keywords

PDF download

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

MULTI-CLASS SPOKEN LANGUAGE DETECTION USING ARTIFICIAL INTELLIGENCE WITH FRACTAL AL-BIRUNI EARTH RADIUS OPTIMIZATION ALGORITHM

Abstract

Recommended