No Access

Query Execution Time Analysis Using Apache Spark Framework for Big Data: A CRM Approach

Madan Lal Yadav

Indian Institute of Management Bodh Gaya, Bihar, India

https://doi.org/10.1142/S0219649222500502Cited by:3 (Source: Crossref)

Abstract

Customer Relationship Management (CRM) is a systematic way of working with current and prospective customers to manage long-term relationships and interactions between the company and customers. Recently, Big Data has become a buzzword. It consists of huge data repositories, having information collected from online and offline resources, and it is hard to process such datasets with traditional data processing tools and techniques. The presented research work tries to explore the potential of Big Data to create, optimise and transform an insightful customer relationship management system by analysing large amount of datasets for enhancing customer life cycle profitability. In this research work, a dataset, “Book Crossing” is used for Big Data processing and execution time analysis for simple and complex SQL queries. This research tries to analyse the impact of data size on the query execution time for one of the majorly used Big Data frameworks, namely Apache Spark. It is a recently developed in-memory Big Data processing framework with a SPARK SQL module for efficient SQL query execution. It has been found that Apache-Spark gives better results with large size datasets compare to small size datasets and fares better as compared to Hadoop, one of the majorly used Big Data Frameworks (based on qualitative analysis).

Keywords:

References

Abdi, A, SM Shamsuddin, S Hasan and J Piran (2018). Machine learning-based multidocuments sentiment-oriented summarization using linguistic treatment. Expert Systems with Applications, 109, 66–85. Crossref, Web of Science, Google Scholar
Anwar, M, SZ Khan and SZA Shah (2018). Big data capabilities and firm’s performance: A mediating role of competitive advantage. Journal of Information & Knowledge Management, 17(04), 1850045. Link, Google Scholar
Armbrust, M, A Ghodsi, M Zaharia, RS Xin, C Lian, Y Huai and MJ Franklin (2015). Spark SQL: Relational data processing in spark. In Proc. 2015 ACM SIGMOD Int. Conf. Management of Data - SIGMOD ’15, pp. 1383–1394. Crossref, Google Scholar
Balar, A, N Malviya, S Prasad and A Gangurde (2013). Forecasting consumer behavior with innovative value proposition for organizations using Big Data analytics. In 2013 IEEE International Conference on Computational Intelligence and Computing Research, pp. 1–4. Crossref, Google Scholar
Bolton, RN and KN Lemon (1999). A dynamic model of customers’ usage of services: Usage as an antecedent and consequence of satisfaction. Journal of Marketing Research, 36(2), 171–186. Web of Science, Google Scholar
Bolton, RN (1998). A dynamic model of the duration of the customer’s relationship with a continuous service provider: The role of satisfaction. Marketing Science, 17(1), 45–65. Crossref, Web of Science, Google Scholar
Chandrasekhar, U, A Reddy and R Rath (2013). A comparative study of enterprise and open source Big Data analytical tools. In 2013 IEEE Conf. Information and Communication Technologies, ICT 2013, (Ict), pp. 372–377. Crossref, Google Scholar
Chunduri, RK and AK Cherukuri (2021). Big data processing frameworks and architectures: A survey. In Handbook of Big Data Analytics, V Ravi and AK Cherukuri (eds.), pp. 37–104. London, United Kingdom: The Institution of Engineering and Technology. Google Scholar
Chunduri, RK and AK Cherukuri (2021). Scalable algorithm for generation of attribute implication base using FP-growth and spark. Soft Computing, 25(14), 9219–9240. Crossref, Web of Science, Google Scholar
De Caigny, A, K Coussement and KW De Bock (2020). Leveraging fine-grained transaction data for customer life event predictions. Decision Support Systems, 130, 113232. Crossref, Web of Science, Google Scholar
Dehkharghani, R, B Yanikoglu, D Tapucu and Y Saygin (2012). Adaptation and use of subjectivity lexicons for domain dependent sentiment classification. In Proc. IEEE 12th Int. Conf. Data Mining Workshops Adaptation, Brussels, December, pp. 669–673. Crossref, Google Scholar
Dehkordi, YH, A Thomo and S Ganti (2015). Incorporating user reviews as implicit feedback for improving recommender systems. In Proc. - 4th IEEE Int. Conf. Big Data and Cloud Computing, BDCloud 2014 with the 7th IEEE Int. Conf. Social Computing and Networking, SocialCom 2014 and the 4th Int. Conf. Sustainable Computing and Communications, SustainCom 2014, pp. 455–462. Google Scholar
del Carmen Rodríguez-Hernández, M and S Ilarri (2021). AI-based mobile context-aware recommender systems from an information management perspective: Progress and directions. Knowledge-Based Systems, 215, 106740. Crossref, Web of Science, Google Scholar
Durham, E-EA, A Rosen and RW Harrison (2014). A model architecture for Big Data applications using relational databases. 2014 IEEE International Conference on Big Data (Big Data), pp. 9–16. Crossref, Google Scholar
Ebner, K, T Bühnen and N Urbach (2014). Think big with Big Data: Identifying suitable Big Data strategies in corporate environments. In Proceedings of the Annual Hawaii International Conference on System Sciences, pp. 3748–3757. Google Scholar
Ebrahimi, L, VR Mirabi, MH Ranjbar and EH Pour (2019). A customer loyalty model for e-commerce recommendation systems. Journal of Information & Knowledge Management, 18(03), 1950036. Link, Web of Science, Google Scholar
Farooqi, R and K Raza (2011). A Comprehensive Study of CRM through Data Mining Techniques. arXiv:1205.1126 [cs.DB]. Google Scholar
Farquad, MAH, V Ravi and SB Raju (2014). Churn prediction using comprehensible support vector machine: An analytical CRM application. Applied Soft Computing, 19, 31–40. Crossref, Web of Science, Google Scholar
Ghasemaghaei, M and G Calic (2019). Can big data improve firm decision quality? The role of data quality and data diagnosticity. Decision Support Systems, 120, 38–49. Crossref, Web of Science, Google Scholar
Hammou, BA, AA Lahcen and S Mouline (2018). APRA: An approximate parallel recommendation algorithm for Big Data. Knowledge-Based Systems, 157, 10–19. Crossref, Web of Science, Google Scholar
Handzic, M, K Ozlen and N Durmic (2014). Improving customer relationship management through business intelligence. Journal of Information & Knowledge Management, 13(02), 1450015. Link, Google Scholar
Jiang, T, Q Zhang, R Hou, L Chai, SA McKee, Z Jia and N Sun (2014). Understanding the behavior of in-memory computing workloads. IISWC 2014 - IEEE Int. Symp. Workload Characterization, pp. 22–30. Crossref, Google Scholar
Kannan, P (2015). Beyond Hadoop MapReduce Apache Tez and Apache Spark. Google Scholar
Kornacker, M, A Behm, V Bittorf, T Bobrovytsky, C Ching, A Choi and M Yoder (2015). Impala: A Modern, Open-Source SQL Engine for Hadoop. Cidr. Google Scholar
Krumeich, J, D Werth, P Loos and S Jacobi (2014). Big data analytics for predictive manufacturing control – A case study from process industry. In Proc. 3rd IEEE Int. Congress on Big Data (BigData 2014), pp. 530–537. Crossref, Google Scholar
Lakshmi, D, I Kim and J Woo (2013). Analysis of movie lens data set using hive. ARPN Journal of Science and Technology, 3(12), 1194–1198. Google Scholar
Leveling, J, M Edelbrock and B Otto (2014). Big data analytics for supply chain management. In Proc. 2014 IEEE IEEM, pp. 918–922. Crossref, Google Scholar
Li, L, S Bagheri, H Goote, A Hasan and G Hazard (2013). Risk adjustment of patient expenditures: A Big Data analytics approach. In Proc. - 2013 IEEE Int. Conf. Big Data, Big Data 2013, pp. 12–14. Crossref, Google Scholar
Liu, D, S Xu and Z Cui (2014). Using client-side access partitioning for data clustering in Big Data applications. In 15th IEEE/ACIS Int. Conf. Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), pp. 1–5. IEEE. Crossref, Google Scholar
Liu, L, Z Yang and Y Benslimane (2013). Conducting efficient and cost-effective targeted marketing using data mining techniques. In Proc. 2013 4th Global Congress on Intelligent Systems, GCIS 2013, pp. 102–106. Crossref, Google Scholar
Maheswari, RU, SS Mahesan, D Tamilarasan and AK Subramani (2014). Role of data mining in CRM. International Journal of Engineering Research, 3(2), 75–78. Crossref, Google Scholar
Maroufkhani, P, ML Tseng, M Iranmanesh, WKW Ismail and H Khalid (2020). Big data analytics adoption: Determinants and performances among small to medium-sized enterprises. International Journal of Information Management, 54, 102190. Crossref, Web of Science, Google Scholar
Nadungodage, CH, Y Xia, JJ Lee, M Lee and CS Park (2013). GPU accelerated item-based collaborative filtering for big-data applications. In Proc. - 2013 IEEE Int. Conf. Big Data, Big Data 2013, pp. 175–180. Crossref, Google Scholar
Okada, M, K Takeuchi and K Hashimoto (2014). A method to classify customer reviews of Japanese hotels by support vector machine using estimation sentence patterns information. In Proc. 2014 IIAI 3rd Int. Conf. Advanced Applied Informatics, pp. 567–572. Crossref, Google Scholar
Patel, A, H Gheewala and L Nagla (2014). Using social big media for customer analytics. In Proc. 2014 Conf. IT in Business, Industry and Government: An International Conference by CSI on Big Data, CSIBIG 2014, pp. 1–6. Crossref, Google Scholar
Raut, VB and DD Londhe (2014). Opinion mining and summarization of hotel reviews. In Proc. 2014 6th Int. Conf. Computational Intelligence and Communication Networks, Bhopal, November, pp. 556–559. Crossref, Google Scholar
Rong, Z, D Xia and Z Zhang (2013). Complex statistical analysis of Big Data: Implementation and application of apriori and FP-growth algorithm based on MapReduce. In Proc. IEEE Int. Conf. Software Engineering and Service Sciences, ICSESS, (2012), pp. 968–972. Google Scholar
Roy, A, S Banerjee, C Bhatt, Y Badr and S Mallik (2018). Hybrid group recommendation using modified termite colony algorithm: A context towards big data. Journal of Information & Knowledge Management, 17(02), 1850019. Link, Google Scholar
Sharma, S and V Mangat (2015). Technology and Trends to Handle Big Data: Survey. 2015 Fifth Int. Conf. Advanced Computing & Communication Technologies, pp. 266–271. Crossref, Google Scholar
Shi, H and X Li (2011). A sentiment analysis model for hotel reviews based on supervised learning. In Proc. 2011 Int. Conf. Machine Learning and Cybernetics, Guilin, pp. 950–954. Crossref, Google Scholar
Sleeman IV WC and B Krawczyk (2021). Multi-class imbalanced big data classification on Spark. Knowledge-Based Systems, 212, 106598. Crossref, Web of Science, Google Scholar
Sristy, NB, P Kadari and H Yadamreddy (2021). Query optimization strategies for big data. In Handbook of Big Data Analytics, V Ravi and AK Cherukuri (eds.), pp. 126–156. London, United Kingdom: The Institution of Engineering and Technology. Google Scholar
Tahara, D, T Diamond and DJ Abadi (2014). Sinew: A sql system for multi-structured data. In Proc. 2014 ACM SIGMOD Int. Conf. Management of Data, pp. 815–826. Crossref, Google Scholar
Tekiner, F and JA Keane (2013). Big data framework. In Proc. - 2013 IEEE Int. Conf. Systems, Man, and Cybernetics, SMC 2013, pp. 1494–1499. Crossref, Google Scholar
Thangavel, SK, NS Thampi and CI Johnpaul (2013). Performance analysis of various recommendation algorithms using apache hadoop and mahout. International Journal of Scientific & Engineering Research, 4(12), 279–287. Google Scholar
Vafeiadis, T, KI Diamantaras, G Sarigiannidis and KC Chatzisavvas (2015). A comparison of machine learning techniques for customer churn prediction. Simulation Modelling Practice and Theory, 55, 1–9. Crossref, Web of Science, Google Scholar
Venkatesan, T, K Saravanan and T Ramkumar (2019). A big data recommendation engine framework based on local pattern analytics strategy for mining multi-sourced big data. Journal of Information & Knowledge Management, 18(01), 1950009. Link, Web of Science, Google Scholar
Yadav, ML and B Roychoudhury (2019a). Effect of trip mode on opinion about hotel aspects: A social media analysis approach. International Journal of Hospitality Management, 80, 155–165. Crossref, Web of Science, Google Scholar
Yadav, ML and B Roychoudhury (2019b). Effectiveness of domain-based lexicons vis-à-vis general lexicon for aspect level sentiment analysis: A comparative analysis. Journal of Information and Knowledge Management, 18(03), 1950033. Link, Web of Science, Google Scholar
Yin, L, L Qin, Z Jiang and X Xu (2021). A fast parallel attribute reduction algorithm using Apache Spark. Knowledge-Based Systems, 212, 106582. Crossref, Web of Science, Google Scholar
Zaharia, M, M Chowdhury, T Das and A Dave (2012). Fast and interactive analytics over Hadoop Data with Spark. Usenix, 37(4), 45–51. Google Scholar
Zhang, X and S Mahadevan (2020). Bayesian neural networks for flight trajectory prediction and safety assessment. Decision Support Systems, 131, 113246. Crossref, Web of Science, Google Scholar
Zaharia, M, M Chowdhury, MJ Franklin, S Shenker and I Stoica (2010). Spark: Cluster computing with working sets. HotCloud’10 Proc. 2nd USENIX Conf. Hot Topics in Cloud Computing, 10. Google Scholar
Zhang, L, R Ghosh, M Dekhil, M Hsu and B Liu (2015). Combining lexicon-based and learning-based methods for Twitter sentiment analysis. Technical Report No. HPL-2011-89, HP Laboratories, pp. 1–7. Google Scholar
Zhao, R (2012). Improve Marketing Campaign ROI using Uplift Modelling. Google Scholar
Zheng, J and A Dagnino (2014). An initial study of predictive machine learning analytics on large volumes of historical data for power system applications. 2014 IEEE Int. Conf. Big Data (Big Data), pp. 952–959. Crossref, Google Scholar