Research PapersNo Access

Language identification framework in code-mixed social media text based on quantum LSTM — the word belongs to which language?

Shashi Shekhar

Department of Computer Engineering and Applications, GLA University, Mathura 281406, India

E-mail Address: shashi.shekhar@gla.ac.in

Corresponding author.

Search for more papers by this author

Dilip Kumar Sharma

Department of Computer Engineering and Applications, GLA University, Mathura 281406, India

E-mail Address: dilip.sharma@gla.ac.in

Search for more papers by this author

, and

M. M. Sufyan Beg

Department of Computer Engineering, Aligarh Muslim University, Aligarh 202002, India

E-mail Address: mmsbeg@cs.berkeley.edu

Search for more papers by this author

https://doi.org/10.1142/S0217984920500864Cited by:27 (Source: Crossref)

Abstract

Machine learning (ML) architectures based on neural model have garnered considerable attention in the field of language classification. Code-mixing is a common phenomenon on social networking sites for exhibiting opinion on a topic. The code-mixed text is the approach of mixing two or more languages. This paper describes the application of the code-mixed index in Indian social media texts and compares the complexity to identify language at the word level using Bi-directional Long Short-Term Memory model. The major contribution of the work is to propose a technique for identifying the language of Hindi–English code-mixed data used in three social media platforms namely, Facebook, Twitter and WhatsApp. Here, we demonstrate that a special class of quantum LSTM network model is capable of learning and accurately predicting the languages used in social media texts. Our work paves the way for future applications of machine learning methods in quantum dynamics without relying on the explicit form of the Hamiltonian.

Keywords:

References

1. N. Qin and K. Chen, Mod. Phys. Lett. B 32 (2018) 1840093. Link, Web of Science, ADS, Google Scholar
2. H. Kaur and H. S. Pannu, Mod. Phys. Lett. B 33 (2019) 1950245. Link, Web of Science, ADS, Google Scholar
3. B. A. Labinghisa and D. M. Lee, Mod. Phys. Lett. B 33 (2019) 1940036. Link, Web of Science, ADS, Google Scholar
4. A. S. Gliozzi and R. Zecchina, Mod. Phys. Lett. B 13 (1999) 1. Link, Web of Science, ADS, Google Scholar
5. M. Kaur, H. K. Gianey, D. Singh and M. Sabharwal, Mod. Phys. Lett. B 33 (2019) 1950022. Link, Web of Science, ADS, Google Scholar
6. S. Sharma and V. Kumar, Mod. Phys. Lett. B 32 (2018) 1850212. Link, Web of Science, ADS, Google Scholar
7. F. Wang, B. Zhang, S. Chai and Y. Xia, Mod. Phys. Lett. B 32 (2018) 1850101. Link, Web of Science, ADS, Google Scholar
8. D. Singh and V. Kumar, Mod. Phys. Lett. B (2018) 1850051. Link, Web of Science, Google Scholar
9. B. Kumar, O. Vyas and R. Vyas, Mod. Phys. Lett. B (2019) 1950303. Link, Web of Science, Google Scholar
10. S. Khare and P. Kaushik, Mod. Phys. Lett. B (2019) 1950214. Link, Web of Science, Google Scholar
11. R. Weischedel et al. Proc. Workshop on Speech and Natural Language. Association for Computational Linguistics (1989). Google Scholar
12. Y. Kim, arXiv:1408.5882 (2014). Google Scholar
13. U. Barman, A. Das, J. Wagner and J. Foster, Code mixing: A challenge for language identification in the language of social media, in Proc. First Workshop on Computational Approaches to Code Switching, 2014, pp. 13–23. Google Scholar
14. L. King, E. Baucom, T. Gilmanov, S. Kübler, D. Whyatt, W. Maier and P. Rodrigues, The iucl+ system: Word-level language identification via extended markov models, in Proc. First Workshop on Computational Approaches to Code Switching, 2014, pp. 102–106. Google Scholar
15. B. King and S. Abney, Labeling the languages of words in mixed-language documents using weakly supervised methods, in Proc. 2013 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2013, pp. 1110–1119. Google Scholar
16. M. Choudhury, R. Saraf, V. Jain, A. Mukherjee, S. Sarkar and A. Basu, Int. J. Doc. Anal. Recog. 10(3–4) (2007) 157. Web of Science, Google Scholar
17. Y. Vyas, S. Gella, J. Sharma, K. Bali and M. Choudhury, PoS tagging of English-Hindi code-mixed social media content, in Proc. 2014 Conf. on Empirical Methods in Natural Language Processing (EMNLP), 2014, pp. 974–979. Google Scholar
18. A. Jamatia, A. Das and B. Gambäck, J. Intell. Syst. 28(3) (2019) 399. Web of Science, Google Scholar
19. R. Sequiera, M. Choudhury, P. Gupta, P. Rosso, S. Kumar, S. Banerjee, S. K. Naskar, S. Bandyopadhyay, G. Chittaranjan, A. Das and K. Chakma, Overview of FIRE-2015 shared task on mixed script information retrieval, in FIRE Workshops, Vol. 1587, 2015, pp. 19–25. Google Scholar
20. H. Jhamtani, S. K. Bhogi and V. Raychoudhury, Word-level language identification in bi-lingual code-switched texts, in Proc. 28th Pacific Asia Conf. on Language, Information and Computing, 2014, pp. 348–357. Google Scholar
21. R. Ethiraj, S. Shanmugam, G. Srinivasa and N. Sinha, NELIS-named entity and language identification system: Shared task system description, in FIRE Workshops, 2015, pp. 43–46. Google Scholar
22. R. Bhargava, Y. Sharma and S. Sharma, Sentiment analysis for mixed script Indic sentences, in 2016 Int. Conf. on Advances in Computing, Communications and Informatics (ICACCI ), IEEE, 2016, pp. 524–529. Google Scholar
23. D. Khurana, A. Koli, K. Khatter and S. Singh, arXiv:1708.05148 (2017). Google Scholar
24. U. Barman, J. Wagner and J. Foster, Part-of-speech tagging of code-mixed social media content: Pipeline, stacking and joint modelling, in Proc. Second Workshop on Computational Approaches to Code Switching, 2016, pp. 30–39. Google Scholar
25. K. Bali, J. Sharma, M. Choudhury and Y. Vyas, I am borrowing ya mixing? An analysis of English-Hindi code mixing in Facebook, in Proc. First Workshop on Computational Approaches to Code Switching, 2014, pp. 116–126. Google Scholar
26. R. K. Rao, Pattabhi and S. L. Devi, CMEE-IL: Code mix entity extraction in Indian Languages from social media text@ FIRE 2016-an overview, in FIRE (Working Notes), 2016, pp. 289–295. Google Scholar
27. G. R. Devi, P. V. Veena, M. Anand Kumar and K. P. Soman, AMRITA-CEN@FIRE 2016: Code-mix entity extraction for Hindi–English and Tamil–English tweets, in CEUR Workshop Proc., Vol. 1737, 2016, pp. 304–308. Google Scholar
28. K. Sapkal and U. Shrawankar, Transliteration of secured SMS to Indian regional language, Procedia Comput. Sci. 78 (2016) 748–755. Google Scholar
29. A. Zubiaga, I. San Vicente, P. Gamallo, J. R. Pichel, I. Alegria, N. Aranberri, A. Ezeiza and V. Fresno, Lang. Resour. Eval. 50(4) (2016) 729. Web of Science, Google Scholar
30. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, Going deeper with convolutions, in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, 2015, pp. 1–9. Google Scholar
31. Y. LeCun, P. Haffner, L. Bottou and Y. Bengio, Object recognition with gradient-based learning, in Shape, Contour and Grouping in Computer Vision (Springer, Berlin, Heidelberg, 1999) pp. 319–345. Google Scholar
32. S. Hochreiter and J. Schmidhuber, Neural Comput. 9(8) (1997) 1735. Web of Science, Google Scholar
33. T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado and J. Dean, Distributed representations of words and phrases and their compositionality, in Advances in Neural Information Processing Systems, 2013, pp. 3111–3119. Google Scholar
34. P. V. Veena, M. Anand Kumar and K. P. Soman, Computación y Sistemas 22(1) (2018) 65. Web of Science, Google Scholar
35. S. Banerjee, K. Chakma, S. K. Naskar, A. Das, P. Rosso, S. Bandyopadhyay and M. Choudhury, Overview of the mixed script information retrieval (msir) at fire-2016, in Forum for Inf. Retrieval Evaluation (Springer, Cham, 2016), pp. 39–49. Google Scholar
36. R. Sequiera, M. Choudhury, P. Gupta, P. Rosso, S. Kumar, S. Banerjee, S. Kumar Naskar et al., Overview of FIRE-2015 shared task on mixed script information retrieval, in FIRE Workshops, Vol. 1587, 2015, pp. 19–25. Google Scholar
37. G. R. Devi, P. V. Veena, M. Anand Kumar and K. P. Soman, Entity extraction of Hindi–English and Tamil–English code-mixed social media text, in Forum for Information Retrieval Evaluation (Springer, Cham, 2016), pp. 206–218. Google Scholar
38. A. Alekseev and S. Nikolenko, Comput. Sistemas 21(2) (2017) 203. Web of Science, Google Scholar
39. S. Shekhar, D. K. Sharma and M. M. S. Beg, Hindi Roman linguistic framework for retrieving transliteration variants using bootstrapping, Procedia Comput. Sci. 125 (2018) 59–67. Google Scholar
40. P. Yadunath, K. V. Arya and S. Tiwari, Mod. Phys. Lett. B 32(14) (2018) 1850148. Link, Web of Science, Google Scholar
41. P. Yadunath, K. V. Arya and S. Tiwari, Mod. Phys. Lett. B 32(25) (2018) 1850300. Link, Web of Science, Google Scholar
42. P. Yadunath, K. V. Arya and S. Tiwari, Multimed. Tools Appl. 78 (2019) 1473. Web of Science, Google Scholar
43. K. V. Arya, P. Yadunath and S. Tiwari, Regularization based modified SART iterative method for CT image reconstruction, in 2016 9th Int. Conf. Developments in eSystems Engineering (DeSE ), 2016, pp. 217–222. Google Scholar
44. P. Yadunath, K. V. Arya and S. Tiwari, Multimed. Tools Appl. 78 (2019) 14733. Web of Science, Google Scholar
45. V. P. Singh, R. Srivastava, Y. Pathak, S. Tiwari and K. Kaur, Mod. Phys. Lett. B 33(19) (2019) 1950213. Link, Web of Science, ADS, Google Scholar
46. Y. Pathak, K. Sharma, K. Singh and P. S. Rana, Quantum Matter 5(3) (2016) 322. Google Scholar
47. Z. Zhang, S. Yang, C. Liu, Y. Han, C. H. Lee, Z. Sun, G. Li and X. Zhang, arXiv:1905.09168 (2019). Google Scholar
48. M. Schuld, I. Sinayskiy and F. Petruccione, Phys. Lett. A 379(7) (2015) 660. Web of Science, Google Scholar
49. M. Zidan, A.-H. Abdel-Aty, M. El-Shafei, M. Feraig, Y. Al-Sbou, H. Eleuch and M. Abdel-Aty, Appl. Sci. 9(7) (2019) 1277. Google Scholar
50. P. Li, H. Xiao, F. Shang, X. Tong, X. Li and M. Cao, Neurocomputing 117 (2013) 81. Web of Science, Google Scholar
51. R. Allauddin, S. Boehmer, E. C. Behrman, K. Gaddam and J. E. Steck, Quantum simulataneous recurrent networks for content addressable memory, in Quantum Inspired Intelligent Systems, (Springer, Berlin, Heidelberg, 2008), pp. 57–76. Google Scholar
52. A. Sagheer, M. Zidan and M. M. Abdelsamea, Entropy 21(8) (2019). Web of Science, Google Scholar
53. N. Kasabov (ed.), Future directions for intelligent systems and information sciences, The Future of Speech and Image Technologies, Brain Computers, WWW, and Bioinformatics, Vol. 45 (Springer Science & Business Media, 2000). Google Scholar
54. M. Zidan, A.-H. Abdel-Aty, A. El-Sadek, E. A. Zanaty and M. Abdel-Aty, Low-cost autonomous perceptron neural network inspired by quantum computation, in AIP Conf. Proc. Vol. 1905, Issue 1 (AIP Publishing, 2017), p. 020005. Google Scholar