BIBLIOMETRIC ANALYSIS ON BIG DATA APPLICATIONS IN INSURANCE SECTOR: PAST, PRESENT, AND FUTURE RESEARCH DIRECTIONS
Abstract
In this study, the key areas and current trends in the field of big data applications in the insurance industry are identified, along with suggestions for future research initiatives. We identified the most prominent authors, journals, organizations, and countries based on their total publications and citations, showing their significance within the network, using bibliometric analysis on a sample of 191 articles retrieved from Scopus from 1976 to 2021. VOSviewer and R-Biblioshiny tools were used to generate the bibliometric output on these retrieved papers. The findings showed that although while a good number of writers from other parts of the world contributed to the literature on big data applications in the insurance industry, during this time, most research papers have listed the United States, India, and China as their affiliated countries. The yearly publication was either one or two, with some discontinuity, from 1976 to 2011, but since 2012, it has increased, exhibiting an exponential growth tendency. The three journals “Risks,” “Applied Stochastic Models in Business and Industry,” and “Expert Systems with Applications” are the most popular for including a sizable number of papers in the field of big data technologies in the insurance sector. Each of the top 10 authors in this field published two research papers during these 46 years. Seven areas, including fraud detection and prevention, risk assessment, pricing & rate making, technology utilization, risk management, claim processing & prediction, and finally digitalization, were the major focus of research papers on bigdata applications in the insurance business. The human-centered AI system development, adoption of wearable technology, personalization, and other topics were found to have received very little attention in this study. As a result, the researchers may now direct future research in this area. This study is completely new of its kind in the domain of insurance though few documents are available on the broad concept of finance.
1. Introduction
Insurance firms create a fund by collecting money from various customers, often known as policy holders, to safeguard them and assist them when necessary. These businesses promise to send the policy holders a certain amount of money when they need it. Like many other industries, the insurance sector has moved towards digital platforms (Churchill 2007). Like any other sector, insurance industry has seen the usage of technology develop exponentially. New methods of tracking, assessing, and reducing risk have been tested by insurance firms thanks to advanced technology and digital platforms. The Internet of Things, artificial intelligence, blockchain, machine learning, big data analytics, and insurance management platforms are some of the important technologies that are being deployed for insurance analytics and insurance technology (Eckert & Osterrieder 2020). The use of data analytics to target clients has long been a success for the insurance sector. The classification of persons into various risk groups, the prevention of fraud losses, and cost optimization are made possible by accident statistics, policyholder personal information, and outside sources (Carfora et al. 2019). The move to digital platforms has created new opportunities for information sources that may be utilized to precisely identify a customer’s segment and comprehend the intricate behavioral patterns of that consumer. Big data is used in the insurance industry to describe the use of unstructured and/or structured data to affect underwriting, rating, pricing, forms, marketing, and claims management. Big data applications have already begun helping insurance businesses.
Bigdata analysis and the state of business are hot topics right now. Applications utilizing big data, artificial intelligence, and machine learning result in decisions and business processes that are specifically tailored to each person’s wants and expectations, enhancing the expansion and effectiveness of commercial operations. Over the past couple of decades, big data analytics has been utilized extensively across industries (Giannakis 2019, Villars & Olofson 2011, Ogbuokiri et al. 2015). Big data is produced as a result of frequent interactions with users and among users in the digital world and the evolution of technology (Hussain & Cambria 2018). Huge amounts of structured and unstructured data are being produced because of technological advancements on the internet, mobile devices, cloud computing, wearable technology, embedded sensors, etc. If the companies can extract value from these data, they will have a competitive advantage (Giannakis 2019, Villars & Olofson 2011). The majority of sectors now use data-driven decision-making, and big data and artificial intelligence (AI) are the two main tools for this process (Liang & Liu 2018). The recent rapid development of bigdata and the availability of vast amounts of digitized data have boosted attention among scholars, business, government, and practitioners (Zhang 2018). The researchers are eager to investigate all facets of how data are present in every element of human life (Williams & Burnap 2017, Williamson 2015). Previous studies have shown that using big data analytics helps firms extract value from enormous amounts of unstructured data, boosts productivity, creativity, and competitiveness, increases customer loyalty, and generally improves business decision-making (Gandomi & Haider 2015, Mishra et al. 2016, Vecchio 2017, Verma & Bhattacharyya 2017a). Every discipline has seen an increase in big data research (Özköse et al. 2015). Since the inception of big data and AI, a plethora of research articles on its theory, technology, and approaches have been created. Due to the vast amount of data generated by mobile usage, the research world has also seen an increase in publications based on applications of big data and AI in several disciplines of study (Eastin et al. 2016, Liao et al. 2015, Paul et al. 2017).
Bigdata offers findings that are focused on strategy, thus decision-makers across sectors use it (Labrinidis2015; Jagadish2015). Bigdata enables to transform the business processes (Mishra et al. 2016). By streamlining all areas of operations, bigdata, AI, and machine learning have radically changed the insurance sector, transforming its present and future. Huge amounts of data are being processed and analyzed by the organizations in order to extract value for their consumers and businesses (George et al. 2014). Several studies have shown that businesses with data-centric business strategies outperform their rivals in terms of productivity (Ernest 2011). Bigdata analytics not only make it possible to revolutionize business operations but also to successfully address significant business difficulties (Wamba et al. 2015).
Bigdata, AI, and machine learning are employed in the insurance industry in a variety of contexts for diverse business choices. Few studies have highlighted the application of these cutting-edge methodologies in insurance consumer studies, particularly in tracking and predicting the behavior of insurance customers and segmenting them according to similar behavior (Carfora et al. 2019, Meyers & Hoyweghen 2020, Zhang & Banerji 2017, Zhang et al. 2019). Every insurer must make a few typical choices regarding the anticipation and processing of claims. Examples include the overall amount of the policyholder’s claims or losses, which claims should be denied, the level of risk involved with an insurance policy, etc. Bigdata, AI, and machine learning approaches are highly good at providing an answer to these issues (Ding et al. 2020, Jain et al. 2019, Johnson et al. 2021). The insurance sector has greatly benefited from digitalization and technical development in terms of developing the capacity to manage massive amounts of data, data privacy, data warehousing, risk management, etc. (Eckert & Osterrieder 2020, Marabelli et al. 2017, Nayak et al. 2019). The insurance business has been quite concerned about fraud since it causes enormous financial losses. The use of bigdata analytics, AI, and machine learning approaches has been advocated by several researchers as cutting-edge methods to identify and anticipate insurance fraud (Dua & Bais 2014, Major & Riedinger 2002, Mall et al. 2018, Song et al. 2019, Wang & Xu 2018). The premium cost is an important decision for an insurer since customers must pay a premium for the insurance policies they buy from an insurance provider. Every insurance firm has a key decision-making process called pricing or rate-making, and the main goal is to charge the proper premium for the correct kind of coverage. Some past studies have shown that big data, artificial intelligence, and machine learning approaches can be utilized to control insurance rate making. (Barry & Charpentier 2020, Christmann et al. 2007, Huang & Meng 2019).
Previous researchers such as Boyd & Crawford (2012), Chen & Zhang (2014), Hashem et al. (2015) have worked on the theoretical development of bigdata analytics. Another strand of researchers has worked on how bigdata analytics can enhance the firm’s capabilities and performance by optimizing its resources (Davenport et al. 2012, Ernest 2011, Murdoch & Detsky 2013, Sharma et al. 2014). It is noteworthy to mention that few researchers focused on the management transition to bigdata analytics (Chen et al. 2012, Davenport et al. 2012, George et al. 2014, Manyika et al. 2017). Nevertheless, research on Bibliometric Analysis on the application of big data technology in insurance industry has not yet been done.
In Table 1, certain research papers concentrating on big data applications across the financial industry are contrasted with our study, which is primarily focused on big data applications within the insurance business.
Basis of comparison | Santoso et al. (2022) | Hasan et al. (2020) | Nobanee et al. (2021) | Altaf (2021) | Our study |
---|---|---|---|---|---|
Title | Insurance Underwriting and Technology Relationship: A Bibliometric Analysis | Current landscape and influence of big data on finance | Big DataApplications the Banking Sector: A Bibliometric Analysis Approach. | Two Decades of Big Data in FinanceSystematic Literature Review and Future Research Agenda | Bibliometric analysis on bigdata applications in Insurance sector: Past, present, and future research directions |
Source | Journal of Theoretical and Applied Information Technology | Journal of Big Data | SAGE Open | (Book) Big Data Analytics for Internet of Things. | |
Time Period | 1987–2021 | Not Defined | 2012–2020 | 2000–2019 | 1976–2021 |
Keywords | Underwriting, Insurance, premium estimation, premium calculation, risk assessment, machine learning, classification, technology, chatbot, artificial intelligence, big data, internet of things, blockchain, cloud computing, mobile computing. | Big data finance, Big data in financial services, Big data in risk management Data management. | Bigdata and Banking | Financial markets, internet finance, financial services, Big data, Internet of things, Financial technology, Fintech and Financial Analytics. | Big Data, Artificial Intelligence, Deep Learning, Machine Learning, insurance Plan, Insurance Policy, structured claim data, unstructured claim data, Insurance Sector, Insurance company, Life Insurance, Health Insurance, Mediclaim, General Insurance. |
Focus of the study | To identify the goals and direction of research literature conducted on insurance underwriting and how it relates to the technology field. | Examines the research on the impact of big data on different financial markets and institutions, as well as its interactions with online credit services, internet finance, financial management, fraud detection, risk analysis, and financial application management. | Highlighted big data’s significance, application, and role in the banking and finance industry. Also examined the potential areas for future research in big data analytics for the banking sector. | To examine the literature on big data in finance, identify knowledge gaps, and discuss potential future study fields. | To identify influence, usage, and the benefits of using big data analytics in insurance sector. |
Methodology | Cluster analysis of Authors Keywords and Citation Analysis. | Bibliometric analysis, citation analysis, and keyword mapping analysis. | Bibliometric Analysis and Thematic Analysis. | Bibliometric Analysis | Bibliometric Analysis and Thematic Analysis. |
Our study differs from the existing one in many ways. The aim of this research is to identify influence, usage, and the benefits of using big data analytics in insurance sector. Given the major clusters, evolving applicability of big data tools in insurance, and potential future research fields, this study pinpoints the use of big data technology in the insurance business. We determine the publishing trends and conceptual and intellectual structure of this field by a bibliometric investigation. This paper offers suggestions for future research topics and identifies some of the important dynamics of big data applications in the insurance industry. In the cited field, we found the publication pattern and intellectual organization. To the best of our knowledge, this study is the first to use bibliometric analysis to summarize big data’s use in the insurance industry. On the mentioned topic, we have answered five research questions (RQs) in this study. These are the inquiries we are looking to answer by the following RQs:
RQ1: What is the current publishing trend in the domain of big data applications in the insurance industry? |
The amount of output is a crucial sign for identifying the direction of the research area’s growth (Fuad et al. 2020, Fusco et al. 2020).
RQ 2: Who are the major writers, publications, and organizations who have made substantial contributions to the literature on applications of big data technology in insurance industry? |
It is important for the scholars to know who the most active authors in a certain field are as it facilitates future collaborations and publications (Rey-Martí et al. 2016, Van Eck & Waltman 2014). The number of publications per entity, the number of citations obtained, and bibliographic coupling are used to determine this (Baker et al.2020, Patel et al. 2022, Khanra et al. 2020). In addition, prominent writers are the best candidates to be contacted for developing policies as well as for conducting more study in that specific area.
RQ 3: What are the most important pieces of literature in this field? |
In bibliometric study, it is crucial to know which text is the most widely acknowledged among academics in the selected field (i.e. Baker et al. 2020, Khanra et al.2020). Researchers can find several study directions by using the well-known and popular documents (Bahoo et al. 2020). This is found using citation analysis based on the number of citations that document has got (Caviggioli & Ughetto 2019, Khanra et al. 2020), citation per document (Patel et al. 2022), and bibliographic coupling (Khanra et al. 2020) based on its relationship with other papers.
RQ4: Which are the most influential countries and their present state of research collaboration between the authors belonging to these countries? |
By using Map charts and network visualization to convey the data, the study of country collaboration provides an unbiased view of the representativeness and development of global research on any given issue. Researchers will be able to increase the effectiveness of their work as well as give detailed insight into the connections between various nations/regions and the various sorts of research accomplishments they have made (Li et al. 2019). The results might provide readers with insight into how big data technology applications have evolved over time and in various countries, according to published publications.
RQ 5: What are the prevailing themes in the body of published literature in the domain of bigdata applications in the insurance sector? |
Discovering the most prevalent topics and areas of study for academics working in this area is the major goal of this research inquiry. This is accomplished utilizing the following methods: theme map, keyword analysis, and bibliographic coupling (Ferreira 2018, Karakus et al. 2019, Aria & Cuccurullo 2017, Cobo et al. 2011).
RQ 6: What would the plan of action be for further study in this area? |
Bibliographic coupling results in discrete clusters that represent different topics (Karakus et al. 2019). The identification of research gaps results from a thorough content analysis of the articles within each cluster. By presenting plans for future research, this offers a chance for this study field to progress (Khanra et al. 2020, Kumar et al. 2020).
We have answered our RQs and evaluated the progress of research in this domain. In our study, we have considered 45 years data for the bibliometric analysis, from 1976 till 2021. Our study has also discussed the future research directions in this domain and thereby inspires researchers to come up with innovative research in this field. The study combines SLR, bibliometric analysis, and content analysis to attempt to address these research topics. In this study, the preferred reporting items for systematic reviews and meta-analyses (PRISMA) recommendations defined by Moher et al. (2009) are employed to guide the usage of SLR to retrieve the suitable literature for further analysis. In addition to choosing keywords, it specifies inclusion and exclusion criteria so that pertinent documents may be extracted (Kumar et al.2020). Following that, bibliometric analysis is used to map the existing literature and group documents. The potential of bibliometric analysis to statistically synthesize the body of information relevant to a certain study topic is widely established (Bhatt et al. 2020, Goyal et al. 2021). The literature relevant to the selected research subject has a large variation in vocabulary. When this occurs, bibliometric analysis, as opposed to a standard literature review, aids in detecting and analyzing relationships more effectively through visualization (Shome et al. 2023). Within each cluster, a content analysis of key papers is also offered. This aided in determining the theme for each cluster (Rodrigues & Mendes 2018). In the social sciences, content analysis is a frequently used method for conducting systematic reviews of the body of current knowledge (Gaur & Kumar 2018, Goyal & Kumar 2021). Along with text analysis, bibliometric analysis helps uncover research gaps and suggest future research areas (Paltrinieri et al. 2019).
By looking at the research enquiries raised by this study, we discovered that discussions on technological applications in the insurance business, particularly by researchers from the USA, India, and China, have greatly improved since 2008. Wang, Y., Major, J. A., and Zhang, J. are few of the most well-known authors in this field. With the most citations, the top three influential publications are Gerson & Star (1986), Wang & Xu (2018) and Riikkinen et al. (2018). The most often co-occurring terms with the searched keywords like machine learning, bigdata, insurance, artificial intelligence, and deep learning include classification, data mining, fraud detection, health insurance, and insurtech. Any one of these seven categories — fraud detection and prevention, pricing and ratemaking, technology use, customization, risk assessment, claim processing and prediction, and finally, digitalization — can be used to classify the research works examined in this study.
Remaining paper is organized as follows. Section 2 explains the research methodology of the study followed by Sec. 3, which explains bibliometric analysis performed to answer the RQs in this paper along with summarization of the findings and Sec. 4 explains the conclusions and contributions of this study.
2. Data and Methodology
A total of 191 research articles in the domain of applications of big data in the insurance sector are retrieved from the Scopus bibliometric database. We formulated six RQs to understand the current and future research trends in the sample domain and these are answered using bibliometric analysis. Bibliometric investigation analyses the prevailing research work on a given ground quantitatively to identify the historical and current trend of research in that field (Bhatt et al. 2020, Verma & Bhattacharyya 2017b). Biblioshiny tool of R-Studio and VOSviewer are the tools used to analyze these 191 research articles.
Bibliometric analysis uses a scientific mapping to review and classify the existing literature on any topic and summarize the research development in any research domain (Bartolini et al. 2019). It provides the summary of most influential authors, articles, journals, countries, keywords, etc. on any topic under study (Bhukya et al. 2022). This study uses a combination of bibliometric analysis and content analysis to decipher the structure of the chosen study field. Content analysis helps in understanding the current intellectual research structure that shapes the future direction of research (Baker et al. 2020).
In conducting a bibliometrics review, mostly Scopus database is chosen over other such bibliometric databases (Pranckut 2021, Sweileh et al. 2016). Quality consistency and continuous improvement in the standard of research articles are the central focus of the journals indexed under Scopus. Scopus database is more inclusive and comprehensive in comparison to the Web of Science, PubMed, and Dimension (Bartol & Budimir 2013). Further, the earlier studies supported that to conduct bibliometric study for the research articles published after 1995, it is better to retrieve the bibliometric data from Scopus database as Scopus declared that the documents published before 1996 were lacking complete citation information under Scopus database (Jacso 2011, Vieira & Gomes 2009, Worthington & Higgs 2006). Thus, based on the support of various prior bibliometric studies, this research paper relies on the bibliometric data of the Scopus database.
In this study, bibliometric analysis along with the content analysis is performed on the existing literature pertaining to big data application in the insurance sector. Figure 1 shows the methodological framework of this study:
![Fig. 1. Fig. 1.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf1.png)
Fig. 1. Methodological framework used in our study.
2.1. Method of analysis
Research defines the structure of a scientific field (Ronda-pupo 2017). We have performed bibliometric analysis to identify the structure of research on applications of bigdata in the insurance sector (Castriotta et al. 2018). A study by Kalantari et al. (2017) highlighted the value of bibliometric approach and discussed the latest research trends in bigdata in different domains. Bibliometric analysis is used to unveil research collaborations amongst researchers on bigdata applications (Xian & Madhavan 2014). The current trends and future research scopes are identified using bibliometric analysis (Li et al. 2017).
2.2. Bibliometric tool selection
Bibliometric analysis can be analyzed using CRExplorer, Publish or Perish, ScientoPyUI, Bibexcel, BiblioMaps, and many such softwares. Prior studies of bibliometric analysis are conducted mostly with Vosviewer and Biblioshiny followed by bibexcel and Gephi in various domains (Hafeez et al. 2019). In this study, we have used VOSviewer and Biblioshiny interface of R (Bibliometrix 3.0) to analyze the articles retrieved from Scopus bibliometric database. In terms of the technologies under analysis, Bibliometrix has a larger collection of methodologies and is accessible to practitioners via Biblioshiny. VOSviewer can import and export data from a variety of sources and offers an excellent visualization (Moral-Muñoz et al. 2020). The top contributors and top influencers in terms of writers, sources, organizations, nations, etc. are found using VOSviewer. The overall summary of the dataset, research trends, and topic clustering based on author keywords is found using Biblioshiny.
With little to no coding experience required, Biblioshiny, a web-based tool accessible through R-studio, is used. Additionally, visualization analysis has also been done using VOSviewer, a tool for bibliometric analysis. VOSviewer’s UI is very user-friendly, and the default layout options are adequate (Donthu et al. 2021). With its low-dimensional visualization, VOSviewer allows users to see how objects are placed so that the distance between any two objects best captures how similar they are. Each map offers the option of viewing it with a density overview or an overlay overview in addition to the standard network view (van Eck & Waltman 2007, p. 299).
Biblioshiney tool of R packages supports the entire range of statistical approaches and visualizations, in contrast to programs like VOSviewer which often specialize in a few visualization types (network graphs). Bibliometrix provides a number of methods for performing temporal or longitudinal analysis (Mougenot & Doussoulin 2022). Simple line graphs to display frequency variations, Historiographic visualization, Thematic Map development, and Reference Publication Year Spectroscopy are all features of Bibliometrix (Moral-Muñoz et al. 2020, Saikia et al.2020).
2.3. Bibliometric database selection
Scopus detects a little more citations than the ISI’s Web of Science (WoS) databases, indexes around 8000 more journals, and may be a more reliable database for business and economics research than WoS. Scopus is unquestionably a more useful tool than WoS for many publications that are not judged deserving of an ISI ranking (Levine-Clark & Gil 2008). With some sample test of both the database, it was found that non-ISI publications received more citations than the worst five ISI business and economics journals (Levine-Clark & Gil 2008).
In every field of research, Google Scholar, a free scholarly resource, yields considerably more citations as it indexes not only the peer-reviewed journals that make up the content of the other sources, but also many titles that WoS and Scopus do not. This latter aspect is crucial since it appears that mentioning of articles in widely circulated newsletters or numerous course syllabi are more indicative of their relevance than citations in the scholarly literature. Although GS citations are often greater, it seems that comparing citation counts to those of other works in the same area published through GS would be a useful way to gauge effect. For scholars without access to the relatively pricey WoS or Scopus, GS might be a handy tool (Levine-Clark & Gil 2008). But amongst the three database GS does not provide a well-connected core citation network.
WoS reflects essentially the well-connected core citation network component on base research, but Scopus allows us to witness some transfer from the core to the applied research peripheral. WoS has a restrictive indexation strategy, whereas Scopus has a selective indexation policy. In addition to better metadata quality, Dimensions’ lax indexation approach sends a similar, if less forceful, message to Scopus regarding coverage (Stahlschmidt & Stephen 2022). Even though each database has pros and cons of its own, we primarily used the Scopus database since it indexes applied research journals more thoroughly (in comparison to WoS) and because papers published in journals on the Scopus list also undergo a rigorous peer review procedure (in comparison to GS). Furthermore, because our research focuses on the application of big data in the insurance sector and analyses application-based research papers in the area, Scopus database is the ideal option.
2.4. Search string of keywords
The bibliometric data using the string of key words ((“Big Data” OR “Artificial Intelligence” OR “AI” OR “Deep Learning” OR “Machine Learning”) AND (“insurance Plan” OR “Insurance Policy” OR “structured claim data” OR “unstructured claim data” OR “Insurance Sector” OR “Insurance company” OR “Life Insurance” OR “Health Insurance” OR “Mediclaim” OR “General Insurance”)) was searched from the Scopus bibliometric database. The initial search could retrieve a total of 1729 documents that included documents from book chapters, conference papers, and documents in all languages. The filtering of documents from the initial search were made by limiting the search of articles in published journals of English language and in the subject area of Computer Science, Mathematics, Economics, Decision Science, Business and Management for the purpose of getting relevant research papers. The final number of documents studied in this paper after the screening and filtering is 191. The searching and screening process of inclusion and exclusion of the research documents in our study is depicted in Fig. 2 using the PRISMA technique.
![Fig. 2. Fig. 2.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf2.png)
Fig. 2. Documents selection.
Initial Search String on Scopus Citation Database.
A quantitative method called bibliometrics evaluates publications’ number and quality using relational, evaluative, and descriptive methods. Simple descriptions of bibliographic data are produced by descriptive techniques, whereas evaluations of the effects of publications are produced by evaluative methods (McBurney & Novak 2002). When examining the relationships between units, such as authors, documents, sources, organizations, and nations, as well as when evaluating the structure of a study topic, relational approaches are used. The following bibliometric techniques are used in this study to answer the RQs:
(1) | Bibliographic Coupling, | ||||
(2) | Citation Analysis, | ||||
(3) | Co-authorship and Collaboration Analysis, | ||||
(4) | Co-word Analysis. |
Bibliographic coupling and citation analysis are seen as indicators of influence and resemblance (Niñerola et al.2019, Zupic & Čater 2015). Although co-word analysis uses keywords to show the relationships between different concepts, co-authorship, and collaboration analysis evaluates the collaboration between scholars, organizations, and countries (Hui & Fong 2004, Wang et al.2012). Based on bibliometric analysis, these techniques have been widely used in a variety of past studies (Liu et al.2005, Xu et al.2018, Zamore et al.2018). The research then provides the descriptive statistics for the bibliometric data. The dynamics of the subject field are then further examined using methods such as bibliographic coupling, citation analysis, co-author analysis, and co-word analysis. The research structure of our study is displayed in Fig. 3 that basically explains analytical framework of our study.
![Fig. 3. Fig. 3.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf3.png)
Fig. 3. Analytical framework of our study.
In our study, we have performed bibliometric analysis on the literature of bigdata application in the insurance sector using citation analysis, co-authorship analysis, keyword and co-occurrence analysis, bibliometric coupling, etc. to answer the RQs (Castriotta et al.2018, Korom 2019, Xu & Yu 2019).
3. Analysis and Findings
The RQ1 of our study is, what is the current publication trend in domain of application of bigdata in insurance sector? To answer RQ1, we considered some parameters of publication trend such as publications by year, author, journal, organization, and country. The data for this analysis is collected from Scopus database and we performed bibliometric analysis to draw relevant insights.
3.1. Data summary
Figure 4 depicts the descriptive statistics on 191 research articles published with a growth rate of 8.98% per annum since 1976. 550 authors have contributed to this research domain amongst which 28 are single authors. The figure confirms that co-authors per document is 3.04 whereas the international co-authorship percentage is 18.32%. The average citation per document is 10.73 whereas the document average age is 6.04. Total number of references in these articles is 6954.
![Fig. 4. Fig. 4.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf4.png)
Fig. 4. Descriptive statistics.
3.2. Publication by year
The number of publications on the topic between 1976–2021 is presented in Fig. 5.
![Fig. 5. Fig. 5.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf5.png)
Fig. 5. Publication by year.
It is evident from the figure that there is a sharp increase in publications after 2016. The advanced technology, innovations, and digitalization transformed the work process in every sector including insurance sector (Manyika et al.2017). This has broadened the scope of research in this topic.
3.3. Publication by country
Bigdata application in insurance sector is an emerging research area and has attracted many researchers to explore more about its benefits, challenges, and encourages the researchers to explore the application of bigdata and AI in different context in insurance industry decisions. Table 2 shows the top publishing countries in this domain both based on number of documents and citations. The top three countries based on documents are United States, India, China, respectively, whereas based on the citation, the top three countries in sequence are United States, China, and India.
Rank | Country | No of articles | Rank | Country | Citations |
---|---|---|---|---|---|
1 | United States | 52 | 1 | United States | 576 |
2 | India | 24 | 2 | China | 345 |
3 | China | 16 | 3 | India | 199 |
4 | Germany | 16 | 4 | Germany | 188 |
5 | Taiwan | 12 | 5 | Turkey | 111 |
6 | Italy | 9 | 6 | Australia | 111 |
7 | United Kingdom | 9 | 7 | Spain | 60 |
8 | Spain | 6 | 8 | Italy | 59 |
9 | France | 5 | 9 | Taiwan | 57 |
10 | Belgium | 5 | 10 | Slovenia | 54 |
3.4. Publications by journal
The 191 articles appeared in 148 journals. The journals with most articles on bigdata applications in insurance sector are listed in Table 3. The theme of these journals is in line with the topic considered. It is evident from Table 3 that the leading journals which are publishing articles in this domain are Risks, Applied Stochastic Models in Business and Industry, Expert Systems with Applications, Big Data and Society, Advanced Computer Science and Applications, etc.
Rank | Sources | Publisher | Articles | SJR score | Q rating |
---|---|---|---|---|---|
1 | Risks | MDPI | 9 | 0.4 | Q2 |
2 | Applied Stochastic Models in Business and Industry | Wiley-Blackwell Publishing | 5 | 0.46 | Q2 |
3 | Expert Systems with Applications | Elsevier | 4 | 2.07 | Q1 |
4 | Big Data and Society | Sage Journals | 3 | 2.04 | Q1 |
5 | International Journal of Advanced Computer Science and Applications | Science and Information Organization | 3 | 0.28 | Q3 |
6 | Lecture Notes in Computer Science | Springer Science | 3 | 0.41 | Q2 |
7 | Decision Support Systems | Elsevier | 2 | 1.97 | Q1 |
8 | Journal of Big Data | Springer Open | 2 | 2.59 | Q1 |
9 | Applied Soft Computing Journal | Elsevier | 2 | 1.96 | Q2 |
10 | Journal of Ambient Intelligence & Humanized Computing | Springer | 2 | 0.91 | Q1 |
3.5. Publication by author and organization
Referring to the database extracted from Scopus, 628 authors from 280 organizations have contributed research papers in this domain. The researchers who have contributed high-impact research work and the organizations which have published articles in the domain of application of bigdata in the insurance sector are listed in Table 4. The research papers of these researchers and organizations are sorted based on the citation rate. The citation rate of the listed for authors’ contribution varies from 29 to 121 whereas the citation for organizations varies from 2 to 181.
Rank | Author | TP | TC | Organization | TP | TC |
---|---|---|---|---|---|---|
1 | Wang Y. | 2 | 121 | School of Statistics and Mathematics, Zhejiang Gongshang University, China | 2 | 3 |
2 | Zhang J. | 2 | 81 | Department of Business & Management, Webster Vienna Private University, Austria | 1 | 2 |
3 | Major J. A. | 2 | 67 | Department of Economics and Business, Saint Anselm College, Manchester, United States | 1 | 2 |
4 | Riedinger D. R. | 2 | 67 | Department of Tourism, Faculty of Economic Sciences, Ionian University, Greece | 1 | 2 |
5 | Alcañiz M. | 2 | 35 | Research Institute of Energy Management and Planning, University of Tehran, Iran | 1 | 2 |
6 | Guillen M. | 2 | 35 | CMR Institute of Technology Bangalore, India | 1 | 1 |
7 | Khoshgoftaar T. M. | 2 | 34 | Visveswaraya Technological University, India | 1 | 1 |
8 | Brockett P. L. | 2 | 33 | Department Of Business And Management, University Of Sussex, United Kingdom | 1 | 16 |
9 | Bhattacharyya S. S. | 2 | 29 | Information And Process Management Department, Bentley University, United States | 1 | 16 |
10 | Krishnamoorthy B. | 2 | 29 | Saunders College Of Business, Rochester Institute of Technology, United States | 1 | 16 |
The top 10 ranked researchers have published only two documents each on the topic considered. Wang contributed two documents which has the highest citation score of 121. The first paper focused on analyzing textual information in the claim to detect fraud. More precisely, it discussed about the process to detect automobile insurance fraud by using text mining methods. The second paper discussed predicting driving risk through in-depth analysis using machine learning techniques. The output of this research will help the insurer to decide the premium accordingly. The research work of Zhang has the second highest citation score of 81. His paper proposed a novel hybrid model to solve the CRM in insurance industry. The proposed model will help the insurance company to identify clusters of similar customers by processing linguistic terms and crisp number data. Major and Riedinger have the citation score of 67 each. Their research paper discussed on the detection of healthcare provider fraud. Their study provides a machine learning-based process to detect healthcare provider fraud by integrating expert knowledge and statistical information.
The most cited organizations in this domain are Tremont Research Institute of USA, School of information, Renmin University of China, Smart City Research Centre, China, and Capital Markets Cooperative Research Centre of Australia with 1 document each and with citations 181, 120, 120, and 105, respectively. The network of co-authorship of authors and countries is addressed by our RQ4 in the later sections of our study.
3.6. Citation network analysis
The second RQ of our study is to explore the most influential articles on the domain of bigdata applications in insurance sector. We performed citation network analysis of 216 articles to answer our second RQ. We used R Biblioshiny for global and local citation and VOS Viewer for citation network map.
The decision of an author to link his document and another authors’ work at a particular point is a citation (Kampis et al.2009). One of the most relevant ways to measure the impact of a research article and to build the intellectual linkages is the citation analysis (Appio et al.2014, Ding & Cronin 2011). An articles’ impact depends on the citations made by other works. Niñerola et al. (2019) remarked that citation measures the influence and degree of recognition of an author, an article, or a journal.
Top 10 research articles by both global citation and local citation are displayed in Table 5. Local citation indicates an article’s impact within the collection of articles considered on this topic. Global citation signifies the number of times an article in the database is cited by other works and research disciplines. Table 5 depicts Gerson & Star (1986) has highest global citation of 181 followed by Wang & Xu (2018) with 120 and Srinivasan & Arunasalam (2013) with 105 global citations respectively. Highest local citation is there for Riikkinen et al. (2018) followed by Wang & Xu (2018), Ince & Aktan (2009) and Kose et al. (2015) each with three local citations.
Sl. No. | Article | Authors | Source | Publication Year | Global Citation | Local Citation |
---|---|---|---|---|---|---|
1 | Analyzing Due Process in the Workplace | Gerson & Star | ACM Transactions on Information Systems | 1986 | 181 | 0 |
2 | Leveraging Deep Learning with LDA-Based Text Analytics to Detect Automobile Insurance Fraud | Wang & Xu | Decision Support Systems | 2018 | 120 | 3 |
3 | Leveraging Big Data Analytics to Reduce Healthcare Costs | Srinivasan & Arunasalam | IT professional | 2013 | 105 | 1 |
4 | Body Area Network BAN–A Key Infrastructure Element for Patient-Centered Medical Applications | Schmidt, Norgall, Mörsdorf, Bernhard & von der Grün T | Biomedizinische Technik | 2002 | 100 | 0 |
5 | Lightweight RFID Protocol for Medical Privacy Protection in IoT | Fan, Jiang, Li & Yang Y | IEEE Transactions on Industrial Informatics | 2018 | 85 | 0 |
6 | Predictive Modeling of Hospital Readmissions Using Metaheuristics and Data Mining | Zheng, Zhang, Yoon, Lam, Khasawneh & Poranki | Expert Systems with Applications | 2015 | 81 | 0 |
7 | Online Clinical Decision Support System Using Optimal Deep Neural Networks | Lakshmanaprabu, Mohanty & Krishnamoorthy S | Applied Soft Computing | 2019 | 55 | 0 |
8 | Using Artificial Intelligence to Create Value in Insurance | Riikkinen M, Saarijärvi H, Sarlin P & Lähteenmäki I | International Journal of Bank Marketing | 2018 | 53 | 5 |
9 | A Comparison of Data Mining Techniques for Credit Scoring in Banking: A Managerial Perspective | Ince H & Aktan B | Journal of Business Economics and Management | 2009 | 52 | 3 |
10 | An Interactive Machine-Learning-Based Electronic Fraud and Abuse Detection System in Healthcare Insurance | Kose I, Gokturk M & Kilic K | Applied Soft Computing | 2015 | 51 | 3 |
3.7. Keyword and co-occurrence analysis
The premise of relatedness needs to be established for publication clustering. According to bibliometrics, relatedness is frequently calculated using either word relationships or citation relationships (Van Eck & Waltman 2017). Authors’ Keywords, Index Keywords, or Keywords Plus are considered as relatedness indications when the relationship is determined based on words (Boyack & Klavans 2010). Authors’ Keywords are a collection of terms that, in the authors’ opinion, the best captures the essence of their work and draws attention to any overarching themes, whereas, Publisher’s Keywords are used by the Publisher to index the documents. The Scopus-generated extended keywords or phrases known as “Keywords Plus” are found in the references to papers rather than in the titles or keywords of those publications (Tripathi et al.2018, Zhang et al.2015). Direct citation relations, co-citation relationships, and bibliographic coupling relationships are further categories for the citation-based relationships (Klavans & Boyack 2017).
For assessing the relatedness of publications, a combined approach often considers both citation linkages and word relations (Boyack & Klavans 2010). Co-word analysis and bibliographic coupling of texts are used in this study to highlight themes and group publications together since they might reveal current topics (Chang et al.2015). For mapping the intellectual structure, citation analysis, and co-citation analysis may also be utilized, however, these techniques cannot identify the emergent ideas.
Co-word analysis is a technique for examining significant word co-occurrences and for identifying relationships and interactions between the research themes and contemporary research trends. Keywords are the phrases and words that authors regularly employ in the papers’ titles, abstracts, and body. Co-word analysis is used in this paper to illustrate the interactions that take place throughout different innovation process phases and to show whether fundamental or applied research is the primary force (Callon et al.1991). It is referred to be co-occurring when two essential words, I and j, appear together in the summary of a single document. It should be clear that counting the occurrences of co-words will not reveal the links between them. Terms that are used often — indeed, almost systematically — will be given an advantage over words that are used less frequently when indexing the papers in a file that are being evaluated (Callon et al.1991).
Co-word analysis uses the interactions between groups of keywords to map the relationships between objects in textual data and literary ideas (Wang et al.2012). It is predicated on the idea that the keywords reflect any document’s main ideas and give a quick overview of the material already available on a given subject (Khanra et al.2020). This approach is frequently used to comprehend trends and hot issues in a certain scientific subject. According to Garfield (1990), Keywords Plus has the ability to deeply capture a document’s content despite the paucity of studies demonstrating this (Zhang et al.2015). Additionally, the authors’ keywords and index keywords place greater emphasis on document identification on a particular concept or topic than do the terms in Keywords Plus, which stress research methodologies, tools, and approaches more (Garfield 1990, Garfield & Sher 1993, Zhang et al.2015). Authors’ keywords and index keywords are analyzed in this study which is known as keyword information. The data on keywords reveal which terms are often utilized in the articles. According to the bibliometric data, the authors of the publications included in this study contributed a total of 457 keywords, while the publishers indexed the materials using 409 keywords. The standardization of keywords was the first stage. This was achieved by adhering to the principle of simplicity (Valderrama-Zurian et al.2017).
For instance, complete form and acronym (“Artificial Intelligence” or “AI”) as well as singular and plural forms (“Decision Support System” and “Decision Support Systems”) were standardized. For the top 100 commonly occurring terms, it was done manually. Thus, a total of 685 author keywords and 1614 index keywords were produced. Based on how frequently they appear, Table 6 lists the top 20 authors’ keywords. Table 6 shows that authors most used machine learning, big data, Insurance and artificial intelligence. A better understanding of these authors’ keywords can be seen in Fig. 6. Authors have highlighted ‘machine learning’, ‘Supervised Learning, and ‘artificial neural network’ used for ‘classification’ of insurance customers. Similarly, they have used “support vector machine”, “neural network”, “deep neural network” to boost “customer relationship” in the insurance companies. The search term used to pick documents is validated by the frequency of the author’s keywords. The perspective of indexers, however, seems to be broader and it considers the insurance industry, information technology, as well as financial technology in insurance management. Author’s keywords have been used for further research since they are carefully selected by writers and are thought to be the most precise description of an article’s content (Song et al.2019, Zhang et al.2015).
Rank | Words | Frequency | Rank | Words | Frequency |
---|---|---|---|---|---|
1 | Machine Learning | 44 | 11 | Big Data Analytics | 6 |
2 | Big Data | 30 | 12 | Clinical Decision Support System | 6 |
3 | Artificial Intelligence | 13 | 13 | Natural Language Processing | 5 |
4 | Insurance | 13 | 14 | Chronic Disease | 4 |
5 | Neural Network | 13 | 15 | Clustering | 4 |
6 | Data Mining | 12 | 16 | Decision Support System | 4 |
7 | Classification | 11 | 17 | Insurtech | 4 |
8 | Deep Learning | 11 | 18 | Medicare | 4 |
9 | Health Insurance | 9 | 19 | Prediction | 4 |
10 | Fraud Detection | 7 | 20 | Privacy | 4 |
![Fig. 6. Fig. 6.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf6.png)
Fig. 6. Keyword co-occurrence network map.
RQ3 of our study discusses which themes are the most popular themes in the domain of bigdata applications in the insurance sector? We have answered this RQ by referring to the keyword and co-occurrence analysis performed in R. Biblioshiny. This section highlights the most frequently used authors’ keywords in the relevant publications. Keyword analysis helps to identify prominent research topic in this area. An article’s content can be represented by a keyword (Comerio & Strozzi 2019). Keyword co-occurrence signifies the link between two keywords that appear together in an article which explains the relationship between those two keywords.
The keyword analysis output is displayed in Table 6 and the keyword co-occurrence network map using VOSviewer network visualization is displayed in Fig. 6, which indicates the literature of bigdata applications in insurance sector. The top 10 most frequently used keywords are machine learning, bigdata, insurance, artificial intelligence, deep learning, classification, data mining, fraud detection, health insurance, and insurtech. It further indicates that bigdata applications in insurance sector research is mostly centered on insurance claims, health insurance, fraud detection, and technological innovations. The minimum number of occurrences of a keyword is set as 2 to build the keyword co-occurrence network. Out of 685 authors’ keywords, 100 met the threshold of 2.
It is also evident from Fig. 6 that insurtech, risk, digitalization, fraud detection, health care has the most prominent nodes indicating their relative importance in this research domain. The co-occurrence of authors keywords shows AI technology, IOT are especially applied autonomous vehicle or automated vehicle, understanding the personal data, risk profile of the insurance buyer and innovation of the insurance products for insurance companies. Techniques like artificial neural network, Bayesian network, decision support system, decision tree, random forest, logistic regression, support vector machine and spatial analysis are used to analyze the health record of the health insurance buyers by insurance companies. Product feature selection, analyzing health care disparities, health insurance claims, and decision on variable annuity or premium are based on ensemble learning, ensemble modeling, and machine learning. In automobile insurance and health insurance, big data analytics, predictive analytics, topic modeling are used for fraud detection and preventive care.
In Fig. 7, it is observed that ensemble learning, artificial neural network, support vector regression, imbalanced data, and supervised learning are the new tools under big data analytics used by insurance companies in the most recent research publications. Activities like risk assessment, reinsurance and innovations are the field of research in emerging in the recent most research documents.
![Fig. 7. Fig. 7.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf7.png)
Fig. 7. Keyword co-occurrence overlay visualization map.
3.8. Country co-authorship analysis
In this section, we have answered our RQ4 which is defined as follows:
RQ4: Which are the most influential countries and the present state of collaboration between the authors belonging to these countries? In scientific research, collaborations amongst the researchers brings intellectual association (Cisneros et al.2018). The literature on a research topic is influenced by the reference of certain publication by the network of co-authors (Caviggioli & Ughetto 2019, Racherla & Hu 2010). Song et al. (2019) documented that the collaboration between individual, organizations, and countries is explained by their social network. We identified the most influential authors. We identified the most influential countries and the network of collaboration of authors from these countries by analyzing the current state of collaboration and the extent of collaboration amongst them. |
The strength of association between the countries is displayed in Fig. 8. We set a minimum of five documents as criterion. Fourteen countries met the threshold. We found that US has the maximum number of documents with 58 documents and 577 citations followed by China with 26 documents and 203 citations and India with 24 documents and 348 citations. It is evident from Fig. 8 that most significant and frequent collaboration is found amongst the scholars of US, China, and India. A decent degree of collaboration is seen between authors from Germany, Taiwan, UK, and Italy. It is also found that Italy has co-authorship with US and UK, Taiwan has co-authorship with US, China, and India whereas Germany has co-authorship only with US.
![Fig. 8. Fig. 8.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf8.png)
Fig. 8. Country co-authorship network map.
3.9. Bibliometric mapping and research themes
In our study, the RQ 5 address the question which is framed as “What is the intellectual structure of current research in the domain of bigdata applications in the insurance sector?” Bibliometric mapping has been widely used in various research domain across the discipline and it summarizes the commonalities in the content of research documents (Donthu et al.2020, Homrich et al.2018, Lee et al.2014, Zhang & Banerji 2017). In our study, bibliometric coupling of documents is used to develop bibliometric network (Fig. 9). In this network diagram, the articles are represented with nodes and the network signifies the link between the articles. Bibliographic coupling of documents provides various clusters of documents. The citation has been taken 6. Out of 216 documents, 74 met the threshold.
![Fig. 9. Fig. 9.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf9.png)
Fig. 9. Network map of bibliographic coupling of documents.
In our study, seven clusters are formed comprising of 37 articles. Table 7 shows the number of publications in each of these seven clusters from 1992 to April 2022.
Year | Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4 | Cluster 5 | Cluster 6 | Cluster 7 | Grand total |
---|---|---|---|---|---|---|---|---|
1986 | 0 | |||||||
1992 | 1 | 1 | ||||||
1997 | 0 | |||||||
2002 | 1 | 1 | ||||||
2007 | 1 | 1 | ||||||
2008 | 0 | |||||||
2009 | 1 | 1 | ||||||
2010 | 1 | 1 | ||||||
2012 | 0 | |||||||
2013 | 0 | |||||||
2014 | 1 | 1 | ||||||
2015 | 1 | 1 | 2 | |||||
2016 | 0 | |||||||
2017 | 1 | 1 | 2 | |||||
2018 | 1 | 1 | 2 | |||||
2019 | 5 | 2 | 1 | 3 | 4 | 15 | ||
2020 | 2 | 1 | 3 | 6 | ||||
2021 | 2 | 1 | 1 | 4 | ||||
Total documents of the cluster | 10 | 7 | 5 | 5 | 4 | 4 | 2 | 37 |
Total citations of the cluster | 231 | 141 | 233 | 118 | 83 | 71 | 67 | 944 |
Total links of the cluster | 84 | 28 | 13 | 18 | 15 | 44 | 30 | 232 |
In this section, we discussed the research themes identified from these seven clusters referring to the most relevant articles included in it. Table 8 depicts the articles in these clusters along with their RQs and future implications. These articles represent various applications of bigdata, AI and machine learning in the insurance sector. It is also evident from Table 8 that bigdata analytics plays a vital role in insurance data handling, business decisions, transformation, and growth. These themes will enable the researchers to identify the gaps and to explore future research in this domain.
Cluster number | Title of the article | Author | Year | Name of the journal | Research question addressed | TC | TLS | Theme |
---|---|---|---|---|---|---|---|---|
1 | EFD: A hybrid knowledge/Statistical-based system for detection of Fraud (Major & Riedinger 2002) | John A. Major and Dan R. Riedinger | 1992 | International Journal of Intelligent Systems | How Electronic Fraud Detection (EFD) assists Investigative Consultants in the Managed Care and Employee Benefits Security Unit of The Travelers Insurance Companies in the detection and pre-investigative analysis of healthcare provider fraud. | 20 | 15 | Fraud Detection and Prevention |
Supervised learning methods for fraud detection in healthcare insurance (Dua & Bais 2014) | Prerna Dua and Sonali Bais | 2014 | Intelligent Systems Reference Library | How to detect healthcare fraud by using supervised machine learning techniques. | 11 | 2 | ||
Application of machine learning and data visualization techniques for decision support in the insurance sector (Rawat et al. 2021) | Seema Rawat, Aakankshu Rawat, Deepak Kumar, A. Sai Sabitha | 2021 | International Journal of Information Management Data Insights | How can the claim analysis help to understand the client strata in a systematic manner and helps in identifying fraud claims and genuine claims by using machine learning algorithm. | 11 | 1 | ||
Development of a medical big-data mining process using topic modelling (Song et al. 2019) | Chang-Woo Song, Hoill Jung & Kyungyong Chung | 2019 | Cluster Computing | How to detect fraudulent and abnormal cases in healthcare services by using machine learning techniques. | 40 | 1 | ||
Leveraging deep learning with LDA-based text analytics to detect automobile insurance fraud (Wang & Xu 2018) | Yibo Wang, Wei Xu | 2018 | Decision Support Systems | What process can be followed to analyze textual information in the claims to detect insurance fraud What process can be used to detect automobile insurance fraud by using text mining methods where the experience of human experts is hidden. | 120 | 4 | ||
Medicare fraud detection using neural networks (Johnson & Khoshgoftaar 2019) | Justin M. Johnson & Taghi M. Khoshgoftaar | 2019 | Journal of Big Data | How the Medicare fraud, waste and abuse can be detected using neural networks? | 26 | 8 | ||
2 | On a strategy to develop robust and simple tariffs from motor vehicle insurance data (Christmann et al. 2007) | Andreas Christmann | 2007 | Acta Mathematicae Applicatae Sinica | How data mining techniques can be used to determine the actual premium to be charged to the customer. | 8 | 5 | Pricing and Ratemaking |
Personalization as a promise: Can Big Data change the practice of insurance? (Barry & Charpentier 2020) | Laurence Barry & Arthur Charpentier | 2020 | Big Data and Society | What is the impact of big data technologies for insurance ratemaking? Does telematics removed or not risk apprehensions and pricing in motor insurance? | 13 | 27 | ||
Automobile insurance classification ratemaking based on telematics driving data (Huang & Meng 2019) | Yifan Huang & Shengwang Meng | 2019 | Decision Support Systems | How to decide on classification of ratemaking for usage-based insurance (UBI) product. How to predict the risk probability and claim frequency of an insured vehicle? | 30 | 17 | ||
3 | Democratizing health insurance services; accelerating social inclusion through technology policy of health insurance firms (Nayak et al. 2019) | Bishwajit Nayak, Som Sekhar Bhattacharyya & Bala Krishnamoorthy | 2019 b | Business Strategy and Development | Which factors should a health insurance firm consider developing its technology policy? How can the technology policy of a health insurance firm enhance social inclusivity? | 13 | 8 | Technology Utilization |
Integrating wearable technology products and big data analytics in business strategy: A study of health insurance firms | Bishwajit Nayak1 | Som Sekhar Bhattacharyya2 | Bala Krishnamoorthy (2019) | 2019a | Journal of Systems and Information Technology | What are the key dynamic capabilities that health insurance firms should build to manage big data generated by wearable technology to attain a competitive advantage? What is the impact of the adoption of wearable technology products for Indian health insurance firms? | 16 | 5 | ||
DeepReco: Deep learning based health recommender system using collaborative filtering (Sahoo et al. 2019) | Abhaya Kumar Sahoo, Chittaranjan Pradhan, Rabindra Kumar Barik and Harishchandra Dubey 3 (2019) | 2019 | Computation | How big data analytics can be used for the implementation of an effective health recommender system/engine in health care industry. | 34 | 1 | ||
The light and dark side of the black box: Sensor-based technology in the automotive industry (Marabelli et al. 2017) | Marco Marabelli, Sean Hansen, Sue Newell, Chiara Frigerio | 2017 | Communications of the Association for Information Systems | What are the uses of sensor-based technologies in the automotive insurance industry. | 16 | 5 | ||
Smart services in healthcare: A risk-benefit-analysis of pay-as-you-live services from customer perspective in Germany (Wiegard & Breitner 2019) | Rouven-B. Wiegard1 & Michael H. Breitner1 | 2019 | Electronic Markets | What are the significant determinants of an insured’s intention to use wearable devices in pay-as-you-live services by comparing perceived privacy risks and perceived benefits? | 20 | 1 | ||
4 | ‘Happy failures’: Experimentation with behavior-based personalization in car insurance (Meyers & Hoyweghen 2020) | Gert Meyers and Ine Van Hoyweghen | 2020 | Big Data and Society | What is the role of experimentation for the making of big data enabled personalization in insurance market? | 11 | 13 | Personalization |
A Novel Hybrid Correlation Measure for Probabilistic Linguistic Term Sets and Crisp Numbers and Its Application in Customer Relationship Management (Zhang et al. 2019) | Xiaofang Zhang, Zeshui Xu, Peijia Ren | 2019 | International Journal of Information Technology and Decision Making | How to segment similar customers from massive insurance customer data using correlation measures and clustering algorithm. | 12 | 1 | ||
A “pay-how-you-drive” car insurance approach through cluster analysis (Carfora et al. 2019) | Maria Francesca Carfora1 ⋅ Fabio Martinelli2 ⋅ Francesco Mercaldo2 ⋅ Vittoria Nardone3 ⋅ Albina Orlando1 ⋅ | 2019 | Soft Computing | How to identify the driver’s behavior and segment them using unsupervised machine learning techniques. | 27 | 8 | ||
5 | Assessing risk in life insurance using ensemble learning (Jain et al. 2019) | Rachna Jain, Jafar A. Alzubib, Nikita Jaina and Pawan Joshia | 2019 | Journal of Intelligent and Fuzzy Systems | How to evaluate the risk associated with an insurance policy applicant by using ensemble learning. | 9 | 3 | Risk Management |
6 | Predicting motor insurance claims using telematics data — XGboost versus logistic regression (Pesantez-Narvaez et al. 2019) | Jessica Pesantez-Narvaez, Montserrat Guillen and Manuela Alcañiz | 2019 | Risks | How to predict the occurrence of an accident claim using machine learning techniques. | 35 | 9 | Claim Processing and Prediction |
Machine learning improves accounting estimates: Evidence from insurance payments (Ding et al. 2020) | Kexing Ding, Baruch Lev, Xuan Peng & Ting Sun | 2020 | Review of Accounting Studies | Is the loss estimates generated by machine learning superior to actual managerial estimates? How to predict the total loses or claims by policyholders. | 15 | 3 | ||
Covariate selection from telematics car driving data (Wüthrich 2017) | Mario V. Wüthrich, | 2017 | European Actuarial Journal | How to estimate the driving habits and generate pattern in driving style by using telematics data and using machine learning techniques. | 24 | 3 | ||
Responsible Artificial Intelligence in Healthcare: Predicting and Preventing Insurance Claim Denials for Economic and Social Wellbeing (Johnson et al. 2021) | Marina Johnson Abdullah Albizri & Antoine Harfouche | 2021 | Information Systems Frontiers | How to identify potentially denied claims by using a responsible artificial intelligence approach. | 7 | 2 | ||
7 | How digitalization affects insurance companies: Overview and use cases of digital technologies (Eckert & Osterrieder 2020) | Christian Eckert ⋅ Katrin Osterrieder | 2020 | Zeitschrift fur die gesamte Versicherungswissenschaft | Which digital technologies have high strategic relevance for the digital transformation of insurance companies. What is the impact of digital technologies on the insurer’s information technology (IT) system. | 8 | 3 | Digitalization |
Theme 1 discusses on fraud detection and prevention. Fraud has been one of the biggest challenges across the globe. Fraud is an act to achieve gains or benefits illegally on false ground which badly impact the moral of the human being, the law and society and the economic growth of a country (Alexopoulos et al.2007). Gill (2016) remarked that insurance fraud can be defined as a fictitious claim which is made either individually or in a group, overstating a claim with a motive of gaining more than the entitled amount. Therefore, fraud is a cybercrime that causes huge financial losses. It is evident from Table 7 that the articles related to this theme are basically explaining the insurance fraud detection using machine learning techniques. Healthcare provider fraud and health insurance fraud can be effectively identified by using machine learning algorithms (Dua & Bais 2014, Johnson & Khoshgoftaar 2019, Major & Riedinger 2002, Song et al.2019) Claim analysis using machine learning algorithms is used to detect insurance fraud and to classify genuine claims and fraud claims (Rawat et al.2021, Wang & Xu 2018). Fraud can be damaging to insurance business resulting in a huge financial loss. Therefore, fraud needs to be prevented.
Pricing or Ratemaking is another theme. The process to determine what prices or rates must be charged by an insurance company is called pricing or ratemaking. Bigdata and datamining techniques are used for insurance ratemaking. These techniques help to determine right amount of premium to be charged to the customers (Barry & Charpentier 2020, Christmann et al.2007). A study by Huang & Meng (2019) discussed on the classification of ratemaking for user-based insurance product.
Technology utilization is the next theme. Advanced technology and digital platforms have evolved the insurance landscape. Artificial intelligence, internet of things, block chain, machine learning, bigdata analytics are some of the technologies that have enriched the insurance industry by providing tech solutions to track, measure, and control risk. A study by Nayak et al. (2019) discussed about establishing few technology factors that are essential for the health insurance companies in risk management, data warehousing and data privacy. Another strand of researchers documented the impact of the adoption of wearable technology products for Indian health insurance firms (Nayak et al.2019, Wiegard & Breitner 2019). Trust, privacy, and risk are the concerns associated with technology usage. Sensor-based technologies are helping the insurance companies to gain competitive advantage around risk assessment and behavior-based pricing (Marabelli et al.2017, Zarifi et al.2018).
The next theme is personalization. In the insurance sector, personalization is defined as developing a strong understanding of customers, simplifying customer interactions, and providing the right kind of services as per customer needs. Digitalization has made it easier for the insurer, reinsurer, and insurance brokers to drive personalization. To gain competitive advantage, Insurance companies are making bigdata-enabled personalization to have personalized insurance prices, services, and products (Meyers & Hoyweghen 2020). Understanding the customers plays a vital role in business decisions. To track the customer behaviors, bigdata analytics is used (Carfora et al.2019, Zhang 2018).
Risk assessment is the next theme. One of the key objectives of insurance companies is to diversify risk. Insurers verify the customer information for assessing the risks. Based on their behavior, customers are segmented into different risk classes. Bigdata analytics has improved the efficiency of risk assessment process in the insurance industry. The insurer should offer right kind of policy with right amount of premium. To decide the premium the insurer needs to evaluate and assess the risks associated with the insurance policy. A study by Jain et al. (2019) proposed a method of evaluating the risk associated with an insurance policy applicant by using ensemble learning.
The next theme identified is claim processing and prediction. When a claimant requests a claim for his/her policy, the insurer checks and validate the adequate information and authenticity of the claim and either reimburse the money in a part or whole accordingly. Bigdata and machine learning algorithms have made it easier to handle huge insurance claim data. Insurance companies can reduce the operational cost and increase the efficiency of the claim process and gain competitive advantage by predicting loss estimates, authentic claims, potential denied claims by using bigdata analytics (Ding et al.2020, Pesantez-Narvaez et al.2019).
Digitalization is another theme. Digitalization is driving significant changes in the insurance sector. Digitalization enables the insurance companies to use different digital channels and advanced bigdata analytics to have a two-way interaction with the customers. Insurance operations like claim processing and prediction, risk assessment, pricing, etc. are handled effectively due to digital transformation in the insurance industry. A study by Eckert & Osterrieder (2020) highlighted the benefits and opportunities of using digital technology by the insurance companies for various operational and strategic decisions.
3.10. Development of an integrative framework
We have developed an integrative conceptual framework summarizing our cluster analysis. The main objective of this bibliometric analysis is to understand the trend and structure of research in the domain of bigdata applications in the insurance sector. The conceptual framework displayed in Fig. 10 shows different bigdata tools and techniques and their application in insurance decisions.
![Fig. 10. Fig. 10.](/cms/10.1142/S2282717X23300015/asset/images/medium/s2282717x23300015figf10.png)
Fig. 10. Conceptual framework of bigdata applications in insurance sector.
The documents in the clusters highlighted two broad segments of bigdata analytics such as (a) tools and algorithm and (b) Infrastructure. The algorithms used in the extracted papers are clustering, pattern recognition, classification, and regression. Bigdata infrastructure helps in bigdata management and processing. Bigdata infrastructure helps in data collection, data storage, data transfer, and data backup Tozzi et al. (2019).
This research presents an integrated conceptual model that describes the bigdata tools and techniques in insurance business decisions (Fig. 10).
Insurance sector has experienced significant changes due to the evolution of technology. Large amount of insurance data generated from various sources such as web servers, sensors, health care data, telematics, wearable technology, etc. are managed and processed effectively by bigdata tools and techniques. The documents included in the clusters also highlighted few applications of bigdata tools, algorithms, and bigdata infrastructures in insurance sector. Our bibliometric study revealed that the bigdata tools, techniques, and infrastructure are widely used for various insurance business decisions such as fraud detection and prevention, tracking consumer behavior, technology utilization, pricing, and ratemaking, claim processing and prediction, risk management, handling digital platforms, personalization, etc. which ultimately helps the insurance company in broad activities like marketing, operations, and strategy building.
4. Findings and Conclusions
This section discussed the overall research output and suggested directions for future research. We also identified some impediments that the researchers face while working in this domain. The descriptive analysis answers RQ1 and depicts the current research trends in the domain of bigdata applications in the insurance sector. It is evident from the bibliometric analysis that there is an increasing trend of publication on this topic since 2008. However, since 2019, a significant stiff upward trend is visible in the publications. The reason could be due to global pandemic, the organizations across sectors had to create digital innovation, and to adopt data driven cloud-based business which had given a wider scope to the academicians and practitioners to pursue research on various aspect related to this. We found that the authors and the organizations across the globe have contributed to the literature on bigdata applications in the insurance sector. The leading countries contributing to the literature of the sample topic are USA, India, and China. The citation network analysis of our study has answered our RQ2 and suggests that a study by Gerson & Star (1986) has highest global citation of 181 followed by Wang & Xu (2018) with 120 citations. Riikkinen et al. (2018) got highest local citation on his work followed by Wang & Xu (2018).
The prominent keywords are identified and RQ3 is answered through keyword and co-occurrence analysis. The prominent keywords in the sample domain are machine learning, bigdata, insurance, artificial intelligence, deep learning, classification, data mining, fraud detection, health insurance, and insurtech. The studies have focused on insurance claim analysis and prediction, data handling, technology, innovation, etc. Our RQ4 is to identify the most influential countries and the current state of collaboration between the authors from these countries. Our result explains that most frequent collaborations are seen amongst the scholars of US, China, and India on the topic under study. This also highlights that US has maximum citation followed by China and India, respectively. The current intellectual structure in the sample topic (RQ5) is answered by the bibliometric coupling analysis. We have identified few research themes from different clusters formulated by bibliometric coupling analysis such as fraud detection and prevention, pricing and ratemaking, technology utilization, personalization, risk assessment, claim processing and prediction, and digitalization.
The insurance sector is moving towards digitalization thanks to big data, machine learning, artificial intelligence (AI), and neural networks. In order to undertake digital transformation, improve sales and eliminate frauds, and timely settlement of insurance claims, insurance companies are embracing these technologies. Also, the bibliographic coupling of research publications (RQ5) shows the applications of machine learning in fraud detection and prevention, and precise risk analysis for premium calculation through actuary service. These services have an impact through boosting sales and customer satisfaction, accelerating manual tasks, enhancing the path to purchase, streamlining procedures and ensuring system dependability, analyzing financial performance, and managing growth of the insurance companies. Despite these ground-breaking service transmissions, the insurance business being in finance industry still faces a number of serious big data problems. One of the most important and pressing issues with big data services is privacy and data protection as observed in the co-word analysis. Even though every financial service and product depends entirely on data and generates data every second, big data research in insurance service hasn’t yet reached its pinnacle. In this light, it makes sense to settle the next research directions by discussion of this work.
4.1. Impediments to current research
A systematic review of the literature suggests that the growth of research in this domain is still limited. The factors that impede the research growth are summarized as follows:
(a) | Lack of data availability: In most of the developing countries, it is difficult for the independent researchers to access the insurance data which restricts more and better research in this domain. | ||||
(b) | Lack of theory development: Most of the studies in this domain are empirical studies. Therefore, more conceptual research relating to a theoretical framework should be encouraged. | ||||
(c) | Lack of academic collaboration: More collaborations are needed across the globe to develop better research frameworks. |
4.2. Research Gaps and future areas of research
It is observed from the analysis that most of the research in the sample domain are from USA, India, China, etc. (RQ4). However, it is also seen that the authors of these documents mostly belong to the same country. The insurance sectors across the globe differs with respect to their technology adoption, digitalization, innovation, and overall operations. Therefore, cross-country studies would provide better insights and would open vast research scope in this domain.
Most of the research works discussing the application of bigdata are empirical in nature. Conceptual research paper with robust framework in this domain is needed, which can be studied by the researchers. From the thorough literature analysis, we found that most of the studies have proposed a solution to complex insurance business decisions such as claim processing, data warehousing, fraud detection, and prediction, risk assessment, ratemaking, etc (RQ5). However, very few studies are on personalization, digitalization, human centered AI system development, adoption of wearable technology, and telematics data usage for insurance decisions which are the avenues for future research according to the authors of the studied research paper.
The implementation of big data technology that supports the insurance industry’s business in numerous ways was the main theme of every research paper we studied. Nevertheless, none of the research document focuses on the benefits for the customers, such as how personalized insurance services may increase customer satisfaction and wellbeing or how customers can safeguard themselves from the mis-selling tactics of insurance agents and being duped by insurance companies. Future research in this field may also include analyzing consumer engagement with and experiences with AI-enabled insurance services. Therefore, we recommend the future research direction can be more customer oriented.
The reviewed literature revealed that the sensors and internet of things (IoT) frameworks that are part of wearable technology have not been extensively researched in relation to a number of health management services (HMS), which have a significant impact on the determination of rates and pricing for health insurance and life insurance products. Particularly, these wearable gadgets of HMS operations can enable effective and smart diagnosis, supervision, and treatment of diseases and disorders which may lower the risk of the insurance companies. It may be because of implementation (cost and user adoption) and regulatory rules that prohibit the deployment of Blockchain in HMS that research materials on data quality, ethics, and privacy concerns connected to big data technologies in insurance businesses are hard to come by. Future digital transformation will make it possible for clients and insurers to work together on loss prevention, benefiting both sides. By making investments in the avoidance of detrimental events rather than concentrating on managing claims after the damage has already occurred, insurers may gain a significant competitive edge on the market and lower their costs. By warning the insured in case of danger or highlighting the advantages that they can have by altering their behavior and lifestyle, analytics based on real data and devices that generate large amounts of data, such as mobile phones, wearables, and telemetric, have significant opportunities to prevent an unwanted event from occurring.
The next step in personalization will be to pinpoint policyholders’ future demands considering their prospective life stages. Insurance firms strive to be their clients’ enduring partners and allies. An enormous amount of unstructured data is produced by people using the Internet and social networks more often. Insurance providers may develop marketing efforts that are specifically aimed at attracting new customers by examining this data. Therefore, in the future, researchers who are interested in insurance technology can concentrate more on wearable technology and customization to maximize the benefits to both the insurance company and the policy holders as surveys are a significantly less reliable source of information regarding user demands than online activity of the insured.
4.3. Theoretical contribution
Our bibliometric study on bigdata applications in the insurance sector would help the future researchers to identify the themes on bigdata applications in insurance sector and other related domains for research and collaborations. Our study has structured the current research in this domain and discussed the major contributions based on authors, journals, documents, universities, countries, etc.
Earlier studies have discussed impact of bigdata analytics on enterprises (Khanra et al.2020), bigdata applications in banking sector (Nobanee et al.2021), use of bigdata in product development (Zhan et al.2018), bigdata analysis’s usefulness in healthcare industry for effective diagnosis and treatment (Mahajan & Mehta 1984), Bigdata applications in stock market (Hasan et al.2020). Our study extends the existing research to understand the bigdata applications in insurance sector.
We have proposed few emerging themes and a conceptual framework which explains the scope and direction for future research indicating different application areas of insurance sector.
4.4. Practical implications
Our study provides a purview of various emerging themes studied earlier in insurance sector research using bigdata analytics. This bibliometric study outcomes would help the researchers to explore more insurance application areas using alternate bigdata tools and technology. This study would also motivate the practitioners and the firms to invest more on bigdata tools and technology to identify and resolve complex business problems with robust solutions to gain competitive advantage and to ensure consistent growth of the firm.