World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

A Hybrid CNN-BiLSTM-Based Approach for Roman Urdu Sentiment Analysis Using Enhanced Roman Urdu Corpus

    https://doi.org/10.1142/S2717554524500097Cited by:0 (Source: Crossref)

    The fast growth of the technology allowed for the easy and extensive dissemination of information on various topics, including business, marketing, news, and viewpoints on geopolitical situation. Although the number of studies on opinion mining is fast increasing, most of these studies focus on issues related to resource-rich languages. Resource-poor languages like Roman Urdu have been neglected for a long time, although having vast potential for research and the representation of almost 500 million people. This research in Roman Urdu is done using machine learning methods because of the nonavailability of the large and standard corpus. Due to research gaps in Roman Urdu sentiment analysis, there are few publicly available corpus for research purposes that are not large in size and not good in quality to get the promising results using Deep Learning (DL) methods. The key contributions of this research work are the enhancement of the existing Roman Urdu corpus and the use of hybrid Convolutional Neural Network-Bidirectional LSTM (CNN-BiLSTM) model on enhanced Roman Urdu corpus with fine tuning. For enhancement of the Roman Urdu corpus, the reviews are collected from multiple online sources such as politics, sports, entertainment, and food. Moreover, the enhanced corpus is annotated manually by two annotators A and B, following the guidelines given by researchers and statically validated by computing the Cohen’s Kappa score, which is moderate. The conflicted reviews are validated through another annotator C. Finally, the experiments are performed in binary class and multi-class using hybrid DL methods as well as the ML-based models. The experiments show the outperformance of hybrid CNN-BiLSTM model as compared to existing models of Recurrent Convolutional Neural Network (RCNN), RNN, LSTM, SVM, CRF, and Rule-based model as per the evaluation parameters such as with accuracy of 0.774 binary class and 0.721 multi-class on the enhanced Roman Urdu corpus.