The fast growth of the technology allowed for the easy and extensive dissemination of information on various topics, including business, marketing, news, and viewpoints on geopolitical situation. Although the number of studies on opinion mining is fast increasing, most of these studies focus on issues related to resource-rich languages. Resource-poor languages like Roman Urdu have been neglected for a long time, although having vast potential for research and the representation of almost 500 million people. This research in Roman Urdu is done using machine learning methods because of the nonavailability of the large and standard corpus. Due to research gaps in Roman Urdu sentiment analysis, there are few publicly available corpus for research purposes that are not large in size and not good in quality to get the promising results using Deep Learning (DL) methods. The key contributions of this research work are the enhancement of the existing Roman Urdu corpus and the use of hybrid Convolutional Neural Network-Bidirectional LSTM (CNN-BiLSTM) model on enhanced Roman Urdu corpus with fine tuning. For enhancement of the Roman Urdu corpus, the reviews are collected from multiple online sources such as politics, sports, entertainment, and food. Moreover, the enhanced corpus is annotated manually by two annotators A and B, following the guidelines given by researchers and statically validated by computing the Cohen’s Kappa score, which is moderate. The conflicted reviews are validated through another annotator C. Finally, the experiments are performed in binary class and multi-class using hybrid DL methods as well as the ML-based models. The experiments show the outperformance of hybrid CNN-BiLSTM model as compared to existing models of Recurrent Convolutional Neural Network (RCNN), RNN, LSTM, SVM, CRF, and Rule-based model as per the evaluation parameters such as with accuracy of 0.774 binary class and 0.721 multi-class on the enhanced Roman Urdu corpus.

Keywords:

References

1. C. C Aggarwal , Machine Learning for Text (Springer, 2018). Crossref, Google Scholar
2. M. P. Akhter, Z. Jiangbin, I. R. Naqvi, M. Abdelmajeed and M. T. Sadiq , Automatic detection of offensive language for Urdu and roman Urdu, IEEE Access 8 (2020) 91213–91226. Crossref, Google Scholar
3. D. Alessia, F. Ferri, P. Grifoni and T. Guzzo , Approaches, tools and applications for sentiment analysis implementation, Int. J. Comput. Appl. 125 (2015) 3. Google Scholar
4. H. Arif, K. Munir, A. S. Danyal, A. Salman and M. M. Fraz , Sentiment analysis of roman Urdu/Hindi using supervised methods, Proc. ICICC 8 (2016) 48–53. Google Scholar
5. D. Ayata, Y. Yaslan and M. E. Kamasak , Emotion recognition from multimodal physiological signals for emotion aware healthcare systems, J. Med. Biol. Eng. 40(2) (2020) 149–157. Crossref, Google Scholar
6. S. Ayyaz, U. Qamar and R. Nawaz , HCF-CRS: A hybrid content based fuzzy conformal recommender system for providing recommendations with confidence, PloS One 13(10) (2018) e0204849. Crossref, Google Scholar
7. J. Bernabé-Moreno, A. Tejeda-Lorente, J. Herce-Zelaya, C. Porcel and E. Herrera-Viedma , A context-aware embeddings supported method to extract a fuzzy sentiment polarity dictionary, Knowl. — Based Syst. 190 (2020) 105236. Crossref, Google Scholar
8. M. Bilal, H. Israr, M. Shahid and A. Khan , Sentiment classification of Roman-Urdu opinions using Naïve Bayesian, Decision Tree and KNN classification techniques, J. King Saud Univ. — Comput. Inf. Sci. 28(3) (2016) 330–344. Crossref, Google Scholar
9. M. Birjali, A. Beni-Hssane and M. Erritali , Analyzing social media through big data using infosphere biginsights and apache flume, Procedia Comput. Sci. 113 (2017) 280–285. Crossref, Google Scholar
10. M. Birjali, A. Beni-Hssane and M. Erritali , Evaluation of high-level query languages based on MapReduce in Big Data, J. Big Data 5(1) (2018) 1–21. Crossref, Google Scholar
11. M. Birjali, A. Beni-Hssane and M. Erritali , A novel adaptive e-learning model based on Big Data by using competence-based knowledge and social learner activities, Appl. Soft Comput. 69 (2018) 14–32. Crossref, Google Scholar
12. M. Birjali, M. Kasri and A. Beni-Hssane , A comprehensive survey on sentiment analysis: Approaches, challenges and trends, Knowl. — Based Syst. (2021) 107134. Crossref, Google Scholar
13. R. Bose, R. K. Dey, S. Roy and D. Sarddar , Sentiment analysis on online product reviews, in Information and Communication Technology for Sustainable Development (Springer, 2020), pp. 559–569. Crossref, Google Scholar
14. W. Budiharto and M. Meiliana , Prediction and analysis of Indonesia Presidential election from Twitter using sentiment analysis, J. Big Data 5(1) (2018) 1–10. Crossref, Google Scholar
15. S. Chakraborty, P. Goyal and A. Mukherjee , Aspect-based sentiment analysis of scientific reviews, in Proc. ACM/IEEE Joint Conf. Digital Libraries (2020), pp. 207–216. Crossref, Google Scholar
16. W. Chansanama and K. Tuamsukb , Thai Twitter sentiment analysis: Performance monitoring of politics in Thailand using text mining techniques, Int. J. Innov. Creativity Change 11(12) (2020) 436–452. Google Scholar
17. E. M. Clark, T. James, C. A. Jones, A. Alapati, P. Ukandu, C. M. Danforth and P. S. Dodds, A sentiment analysis of breast cancer treatment experiences and healthcare perceptions across twitter, preprint (2018), arXiv:1805.09959. Google Scholar
18. H. H. Do, P. W. C. Prasad, A. Maag and A. Alsadoon , Deep learning for aspect-based sentiment analysis: A comparative review, Exp. Syst. Appl. 118 (2019) 272–299. Crossref, Google Scholar
19. F. Falck, J. Marstaller, N. Stoehr, S. Maucher, J. Ren, A. Thalhammer, A. Rettinger and R. Studer , Measuring proximity between newspapers and political parties: The sentiment political compass, Policy Internet 12(3) (2020) 367–399. Crossref, Google Scholar
20. I. A. Farha and W. Magdy , A comparative study of effective approaches for arabic sentiment analysis, Inf. Process. Manage. 58(2) (2021) 102438. Crossref, Google Scholar
21. R. Feldman , Techniques and applications for sentiment analysis, Commun. ACM 56(4) (2013) 82–89. Crossref, Google Scholar
22. E. Georgiadou, S. Angelopoulos and H. Drake , Big data analytics and international negotiations: Sentiment analysis of Brexit negotiating outcomes, Int. J. Inf. Manage. 51 (2020) 102048. Crossref, Google Scholar
23. H. Ghulam, F. Zeng, W. Li and Y. Xiao , Deep learning-based sentiment analysis for roman Urdu text, Procedia Comput. Sci. 147 (2019) 131–135. Crossref, Google Scholar
24. J. Guerreiro and P. Rita , How to predict explicit recommendations in online reviews using text mining and sentiment analysis, J. Hospitality Tourism Manage. 43 (2020) 269–272. Crossref, Google Scholar
25. M. Hartmann and A. Søgaard , Multilingual negation scope resolution for clinical text, in Proc. 12th Int. Workshop on Health Text Mining and Information Analysis (2021), pp. 7–18. Google Scholar
26. A. Hasan, S. Moin, A. Karim and S. Shamshirband , Machine learning- based sentiment analysis for twitter accounts, Math. Comput. Appl. 23(1) (2018) 11. Google Scholar
27. S. Ul Hassan, A. Akram and P. Haddawy , Identifying important citations using contextual information from full text, in 2017 ACM/IEEE Joint Conf. Digital Libraries (JCDL) (IEEE, 2017), pp. 1–8. Crossref, Google Scholar
28. F. Hemmatian and M. K. Sohrabi , A survey on classification techniques for opinion mining and sentiment analysis, Artif. Intell. Rev. 52(3) (2019) 1495–1545. Crossref, Google Scholar
29. S. Hochreiter and J. Schmidhuber , Long short-term memory, Neural Comput. 9(8) (1997) 1735–1780. Crossref, Google Scholar
30. S. M. Jiménez-Zafra, M. T. Martín-Valdivia, M. D. Molina-González and L. A. Ureña-López , How do we talk about doctors and drugs? Sentiment analysis in forums expressing opinions for medical domain, Artif. Intell. Med. 93 (2019) 50–57. Crossref, Google Scholar
31. M. Khan and K. Malik , Sentiment classification of customer’s reviews about automobiles in roman Urdu, Future of Information and Communication Conf. (Springer, 2018), pp. 630–640. Google Scholar
32. Z. Y. Khan, Z. Niu, S. Sandiwarno and R. Prince , Deep learning techniques for rating prediction: A survey of the state-of-the-art, Artif. Intell. Rev. 54(1) (2021) 95–135. Crossref, Google Scholar
33. A. Khattak, M. Z. Asghar, A. Saeed, I. A. Hameed, S. A. Hassan and S. Ahmad , A survey on sentiment analysis in Urdu: A resource-poor language, Egyptian Informatics J. 22(1) (2021) 53–74. Crossref, Google Scholar
34. H. Li, J. Cui, B. Shen and J. Ma , An intelligent movie recommendation system through group-level sentiment analysis in microblogs, Neurocomputing 210 (2016) 164–173. Crossref, Google Scholar
35. S. Loria, TextBlob: Simplified Text Processing. Release v0.15.2 (2018). Google Scholar
36. Z. Mahmood, I. Safder, R. M. A. Nawab, F. Bukhari, R. Nawaz, A. S. Alfakeeh, N. R. Aljohani and S. Ul Hassan , Deep sentiments in roman Urdu text using recurrent convolutional neural network model, Inf. Processing Manage. 57(4) (2020) 102233. Crossref, Google Scholar
37. A. Majeed, H. Mujtaba and M. O. Beg , Emotion detection in roman Urdu text using machine learning, in Proc. 35th IEEE/ACM Int. Conf. Automated Software Engineering Workshops (2020), pp. 125–130. Crossref, Google Scholar
38. F. Mehmood, M. U. Ghani, M. A. Ibrahim, R. Shahzadi, W. Mahmood and M. N. Asim , A precisely xtreme-multi channel hybrid approach for roman Urdu sentiment analysis, IEEE Access 8 (2020) 192740–192759. Crossref, Google Scholar
39. K. Mehmood, On multi-domain sentence level sentiment analysis for roman Urdu, Ph. D. Dissertation. University of New South Wales, Canberra, Australia (2021). Google Scholar
40. K. Mehmood, D. Essam and K. Shafi , Sentiment analysis system for Roman Urdu, Science and Information Conf. (Springer, 2018), pp. 29–42. Google Scholar
41. K. Mehmood, D. Essam, K. Shafi and M. K. Malik , Discriminative feature spamming technique for roman Urdu sentiment analysis, IEEE Access 7 (2019) 47991–48002. Crossref, Google Scholar
42. K. Mehmood, D. Essam, K. Shafi and M. K. Malik , Sentiment analysis for a resource poor language — Roman Urdu, ACM Trans. Asian Low-Resource Lang. Inf. Processing 19(1) (2019) 1–15. Google Scholar
43. K. Mehmood, D. Essam, K. Shafi and M. K. Malik , An unsupervised lexical normalization for Roman Hindi and Urdu sentiment analysis, Inf. Processing Manage. 57(6) (2020) 102368. Crossref, Google Scholar
44. S. Rai, A. Ullah, W. L. Kuan and R. Mustafa , An enhanced compression method for medical images using SPIHT encoder for fog computing, Int. J. Image Graphics (2023) 2550025. Link, Google Scholar
45. A. Ullah, S. N. Khan and N. M. Nawi , Review on sentiment analysis for text classification techniques from 2010 to 2021, Multim. Tools Appl. 82(6) (2023) 8137–8193. Crossref, Google Scholar
46. S. N. Khan, N. M. Nawi, M. Imrona, A. Shahzad, A. Ullah and A. U. Rahman , Opinion mining summarization and automation process: A survey, Int. J. Adv. Sci. Eng. Inf. Technol. 8(5) (2018) 1836–1844. Crossref, Google Scholar
47. A. Shahzad, N. Nawi, E. Sutoyo, M. Naeem, A. Ullah, S. Naqeeb and M. Aami , Search engine optimization techniques for Malaysian University websites: A comparative analysis on google and bing search engine, Int. J. Adv. Sci. Eng. Inf. Technol. 8(4) (2018) 1262–1269. Crossref, Google Scholar
48. A. Ullah, N. M. Nawi, A. Shahzad, S. N. Khan and M. Aamir , An e-learning system in Malaysia based on green computing and energy level, JOIV: Int. J. Informatics Visualization 1(4–2) (2017) 184–187. Crossref, Google Scholar
49. A. Shahzad, N. M. Nawi, N. Abd Hamid, S. N. Khan, M. Aamir, A. Ullah and S. Abdullah , The impact of search engine optimization on the visibility of research paper and citations, JOIV: Int. J. Informatics Visualization 1(4–2) (2017) 195–198. Crossref, Google Scholar
50. U. Ullah and A. Ullah , An evolutionary algorithm for the solution of multi- objective optimization problem, Int. J. Adv. Appl. Sci. 11(4) (2022) 287–295. Google Scholar
51. S. F. A. Razak, S. Yogarayan and A. Ullah , Preventing impaired driving using IoT on steering wheels approach, HighTech Innov. J. 5(2) (2024) 400–409. Crossref, Google Scholar
52. T. Alam, R. Gupta, N. N. Ahamed and A. Ullah , A decision-making model for self-driving vehicles based on GPT-4V, federated reinforcement learning, and blockchain, Neural Comput. Appl. 36(34) (2024) 21545–21560. Crossref, Google Scholar
53. S. K. Papia, M. A. Khan, T. Habib, M. Rahman and M. N. Islam , DistilRoBiLSTMFuse: An efficient hybrid deep learning approach for sentiment analysis, PeerJ Comput. Sci. 10 (2024) e2349. Crossref, Google Scholar
54. M. S. Khan, M. S. I. Malik and A. Nadeem , Detection of violence incitation expressions in Urdu tweets using convolutional neural network, Exp. Syst. Appl. 245 (2024) 123174. Crossref, Google Scholar
55. S. Nasir, A. Seerat and M. Wasim , Hate speech detection in roman Urdu using machine learning techniques, in 2024 5th Int. Conf. Advancements in Computational Sciences (ICACS) (IEEE, 2024), pp. 1–7. Crossref, Google Scholar
56. Y. Ebabu and M. Chalie, Sentiment Analysis for Amharic-English Code-Mixed Sociopolitical Posts Using Deep Learning (2024). Google Scholar
57. F. Mehmood, H. Ghafoor, M. N. Asim, M. U. Ghani, W. Mahmood and A. Dengel , Passion-Net: A robust precise and explainable predictor for hate speech detection in Roman Urdu text, Neural Comput. Appl. 36(6) (2024) 3077. Crossref, Google Scholar
58. M. Neog and N. Baruah , A hybrid deep learning approach for Assamese toxic comment detection in social media, Procedia Comput. Sci. 235 (2024) 2297–2306. Crossref, Google Scholar
59. L. Mahalakshmi and E. Anbalagan , National language processing for sentiment analysis in social media-A comprehensive review, 2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT), Vol. 1 (IEEE, 2024), pp. 504–508. Crossref, Google Scholar
60. A. M. M. Al Zoubi, Spam Reviews Detection Models in Multilingual Contexts applying Sentiment Analysis, Metaheuristics, and Advanced Word Embedding (2024). Google Scholar
61. A. N. Ma’Aly, D. Pramesti and H. Fakhrurroja , Comparative analysis of deep learning models for multi-label sentiment classification of 2024 presidential election comments, 2024 7th Int. Conf. Informatics and Computational Sciences (ICICoS) (IEEE, 2024), pp. 502–507. Crossref, Google Scholar
62. H. Ghous, M. H. Malik, J. Altaf, S. Nayab, I. Sehrish and S. A. Nawaz , Navigating sarcasm in multilingual text: An in-depth exploration and evaluation, J. Comput. Biomed. Informatics (2024). Google Scholar
63. D. Shou, C. Li, Z. Wang, S. Cheng, X. Hu, K. Zhang and Y. Wang , An intrusion detection method based on attention mechanism to improve CNN-BILSTM model, Comput. J. 67(5) (2024) 1851–1865. Crossref, Google Scholar
64. A. B. Altinel, S. Sahin, M. Z. Gurbuz and G. K. Baydogmus , So-haTRed: A novel hybrid system for Turkish hate speech detection in social media with ensemble deep learning improved by BERT and clustered-Graph networks, IEEE Access 12 (2024) 86252–86270. Crossref, Google Scholar