Processing math: 100%
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleOpen Access

    Design of Intelligent Integrated Testing Platform for Old Residential Areas in Nantong City

    In view of the current problem of various cracks in the old residential areas, and the fact that these cracks may cause safety risks to the building, it is particularly important to carry out real-time monitoring, so as to give early warning in advance and take Corresponding safety measures. In this paper, it is proposed to use steel rulers, feeler gauges and acoustic detectors to measure the length, width and depth of cracks and establish an intelligent and integrated big data monitoring platform to monitor cracks in old residential areas in real time and give early warning to ensure that the houses are in a safe state. Through the big data system, the platform solves the problems of lagging traditional manual inspection, inefficient resource allocation and extensive management and lays the foundation for subsequent housing construction safety identification and crack treatment.

  • articleOpen Access

    Research on Real Estate Market Prediction Models Based on Big Data and Artificial Intelligence

    The intersection of big data and artificial intelligence has significantly transformed computational techniques for predicting real estate market trends, aligning closely with the thematic scope of “Frontiers in Computer Science.” Traditional real estate forecasting models often fail to capture the intricate spatial, temporal, and economic dependencies essential for robust prediction. Challenges such as data sparsity, nonlinearity, and market volatility remain inadequately addressed, limiting their scalability and adaptability in dynamic market conditions. To address these gaps, we propose a novel framework comprising the Dynamic Relational Price Network (DRPN) and the Market Adaptive Optimization Strategy (MAOS). DRPN integrates graph-based reasoning for spatial dependencies, temporal forecasting with recurrent networks, and hierarchical feature learning to improve interpretability and predictive accuracy. Meanwhile, MAOS dynamically adjusts model parameters and regularization strategies based on real-time market conditions, ensuring robust generalization across diverse scenarios. Experimental results demonstrate the superior performance of our approach in predictive accuracy, stability, and scalability compared to conventional methods, providing actionable insights into market dynamics. This research offers a scalable and adaptive solution to real estate forecasting, contributing to the broader applications of AI in computational market analysis.

  • articleOpen Access

    The effect of theme on the number of celebrity active fans under China Weibo data

    Online social media platforms have emerged as integral channels for facilitating social interactions, with celebrities utilizing these platforms to engage with their fan base and cultivate a substantial following. The group of engaged fans, commonly referred to as “active fans”, represents individuals who actively communicate with celebrities and actively participate in discussions pertaining to the celebrities’ endeavors. For celebrities, the task of retaining and augmenting the count of active fans holds immense significance, as it significantly amplifies their social impact and commercial value. Here, we construct dynamic weighted active fan networks by leveraging data from 2021 on Sina Weibo, which stands as China’s largest social media platform. Through a comparative analysis encompassing the network’s structure, the growth rate and the duration of active fans, we delve into the influence wielded by six distinct thematic categories, endorsement, variety, public welfare, sports, music and national affairs. This analysis covers a cohort of nine celebrities spanning five diverse domains, including actors, singers, online influencers, anchors and athletes. The growth trajectory and life cycle exhibited by celebrity fans exhibit notable variations, both within and across the aforementioned themes. These dynamics are further influenced by the inherent structural attributes of the personal fan network belonging to each celebrity. Employing the K-Shape time series clustering algorithm, we have undertaken an in-depth exploration of outburst growth patterns observed in active fans and determined the optimal value of the number of clusters to be k=4 through comparative analysis. Our findings underscore that the themes of endorsement and public welfare exhibit all four growth patterns, namely Double-Peak, Oscillatory, Single-Peak and Continuous Growth Patterns. In contrast, when considering all themes collectively, they collectively demonstrate a single-peaked decaying growth pattern the insights gleaned from this study not only serve as a valuable reference and guide for celebrities across diverse domains who aspire to bolster and augment their social influence but also contribute to the burgeoning fan economy. Moreover, this research introduces novel perspectives and insights for scrutinizing patterns of fan growth and their corresponding dynamics.

  • articleOpen Access

    CUSTOMER PREDICTIVE ANALYTICS USING ARTIFICIAL INTELLIGENCE

    This conceptual paper exclusively focused on how artificial intelligence (AI) serves as a means to identify a target audience. Focusing on the marketing context, a structured discussion of how AI can identify the target customers precisely despite their different behaviors was presented in this paper. The applications of AI in customer targeting and the projected effectiveness throughout the different phases of customer lifecycle were also discussed. Through the historical analysis, behavioral insights of individual customers can be retrieved in a more reliable and efficient way. The review of the literature confirmed the use of technology-driven AI in revolutionizing marketing, where data can be processed at scale via supervised or unsupervised (machine) learning.

  • articleOpen Access

    Ensemble Deep Learning on Large, Mixed-Site fMRI Datasets in Autism and Other Tasks

    Deep learning models for MRI classification face two recurring problems: they are typically limited by low sample size, and are abstracted by their own complexity (the “black box problem”). In this paper, we train a convolutional neural network (CNN) with the largest multi-source, functional MRI (fMRI) connectomic dataset ever compiled, consisting of 43,858 datapoints. We apply this model to a cross-sectional comparison of autism spectrum disorder (ASD) versus typically developing (TD) controls that has proved difficult to characterize with inferential statistics. To contextualize these findings, we additionally perform classifications of gender and task versus rest. Employing class-balancing to build a training set, we trained 3×300 modified CNNs in an ensemble model to classify fMRI connectivity matrices with overall AUROCs of 0.6774, 0.7680, and 0.9222 for ASD versus TD, gender, and task versus rest, respectively. Additionally, we aim to address the black box problem in this context using two visualization methods. First, class activation maps show which functional connections of the brain our models focus on when performing classification. Second, by analyzing maximal activations of the hidden layers, we were also able to explore how the model organizes a large and mixed-center dataset, finding that it dedicates specific areas of its hidden layers to processing different covariates of data (depending on the independent variable analyzed), and other areas to mix data from different sources. Our study finds that deep learning models that distinguish ASD from TD controls focus broadly on temporal and cerebellar connections, with a particularly high focus on the right caudate nucleus and paracentral sulcus.

  • articleFree Access

    A big data analytics framework for determining the travel destination preferences of Indian tourists

    The growth of technology and social media websites has increased the potential to online explore different products and places around the globe. While online websites are primarily responsible for the generation of large amounts of data, this big data may be beneficial to other users provided the proper decision pattern can be analyzed. This work is focusing on the big data from social media to determine the travel destination preferences for Indian tourists. The analysis of online tourism reviews is beneficial to both tourists and businesses in tourist countries. Tourists can analyze all the required aspects prior to traveling and businesses in the destination country can enhance their products. The study aims to analyze the online tourist reviews using supervised machine learning methods (decision tree, k-nearest neighbor, back propagation neural networks and support vector machine) and ensemble learning in order to ascertain the travel preferences of Indian tourists visiting other countries. For the research experiments, significant travel data histories of tourists for the five destination places (Dubai, Indonesia, Malaysia, Thailand and Singapore) are extracted from TripAdvisor. TripAdvisor is a worldwide popular tourism website that provides access to consumers to share their travel experiences. From the selected five destination places, the preferences of Indian tourists are analyzed for the factors of travel & destination comfort, hotel facilities, food quality and attractions of the place. The analysis results of the proposed recommendation system indicate the determination of precise suggestions for Indian tourists traveling to other countries.

  • articleFree Access

    Hybrid COOT-Reverse Cognitive Fruit Fly Optimization-Based Big Data Services and Virtual Machine Allocation for Cloud Storage System

    In recent years, cloud computing technologies have been developed rapidly in this computing world to provide suitable on-demand network access all over the world. A cloud service provider offers numerous types of cloud services to the user. But the most significant issue is how to attain optimal virtual machine (VM) allocation for the user and design an efficient big data storage platform thereby satisfying the requirement of both the cloud service provider and the user. Therefore, this paper presents two novel strategies for optimizing VM resource allocation and cloud storage. An optimized cloud cluster storage service is introduced in this paper using a binarization based on modified fuzzy c-means clustering (BMFCM) algorithm to overcome the negative issues caused by the repetitive nature of the big data traffic. The BMFCM algorithm utilized can be implemented transparently and can also address problems associated with massive data storage. The VM selection is optimized in the proposed work using a hybrid COOT-reverse cognitive fruit fly (RCFF) optimization algorithm. The main aim of this algorithm is to improve the massive big data traffic and storage locality. The CPU utilization, VM power, memory dimension and network bandwidth are taken as the fitness function of the hybrid COOT-RCFF algorithm. When implemented in CloudSim and Hadoop, the proposed methodology offers improvements in terms of completion time, overall energy consumption, makespan, user provider satisfaction and load ratio. The results show that the proposed methodology improves the execution time and data retrieval efficiency by up to 32% and 6.3% more than the existing techniques.

  • articleFree Access

    An Internet of Medical Things-Based Mental Disorder Prediction System Using EEG Sensor and Big Data Mining

    In the colloquy concerning human rights, equality, and human health, mental illness and therapy regarding mental health have been condoned. Mental disorder is a behavioral motif that catalyzes the significant anguish or affliction of personal functioning. The symptoms of a mental disorder may be tenacious, degenerative, or transpire as a single episode. Brain sickness is often interpreted as a combination of how a person thinks, perceives, contemplates and reacts. This may be analogous to a specific region or workings of the brain frequently in a social context. Anxiety disorders, psychotic disorders, personality disorders, mood disorders, eating disorders, and many more are examples of mental disorders, while complications include social problems, suicides, and cognitive impairment. These days, mental disorders are quotidian worldwide, and clinically consequential levels of derangement rise adversely. The purpose of this paper is to aid in prognosis of the type of mental disorder by analyzing the brainwaves such as Alpha (α), Beta (β), Gamma (γ), Theta (𝜃), Delta (δ) with the help of big data analysis and the Internet of Medical Things (IoMT). IoMT helps in gathering the required data and data transmission, while big data analysis helps in predicting the type of disorder.

  • articleOpen Access

    THE USE OF NEURAL NETWORK IN DEFENSE AUDIT NONLINEAR DYNAMIC PROCESSING UNDER THE BACKGROUND OF BIG DATA

    Fractals03 Feb 2022

    The purpose is to further explore the application effect of the neural network algorithm in defense audit and improve the user information security performance. Based on the relevant theoretical basis of neural network in machine learning, the back propagation neural network (BPNN) algorithm model is constructed and optimized. Moreover, by comparing with the classification and prediction effect of the decision tree method, the application effect of BPNN is further clarified. Through statistical analysis, a total of six risk users are screened out. The test data are classified into non-risk user group and risk user group to study the prediction of classification. The specific results are as follows. The prediction accuracy of non-risk group is 99% by using the BPNN algorithm and that is improved to 99.5% by using the optimized BPNN; for risk group, the prediction accuracy of BPNN is only 50% and that of optimized BPNN is 83.3%. Meanwhile, the prediction error rate of the BPNN algorithm is significantly lower than that of the decision tree algorithm, which further verifies the good application effect of the BPNN algorithm. This study can provide scientific and effective reference for the follow-up research of defense audit.

  • articleFree Access

    Using Random Undersampling and Ensemble Feature Selection for IoT Attack Prediction

    One consequence of the widespread use of (IoT) devices is an increase in the volume of attacks on (IoT) networks. In this study, we focus on the Bot-IoT dataset, with the aim of classifying its four types of attacks: Denial-of-Service (DoS), Distributed Denial-of-Service (DDoS), Reconnaissance, and Information Theft. Our contribution is based on the evaluation of the Random Undersampling (RUS) technique and ensemble Feature Selection Techniques (FSTs). Our results indicate that RUS has a positive impact on overall classification performance. Furthermore, our results show that the FSTs are beneficial for DoS, Reconnaissance, and Information Theft classification but not for DDoS classification. Finally, we note that the ensemble classifiers have generally outperformed the nonensemble classifiers in our study.

  • articleFree Access

    A Study on the Prediction of COVID-19 Confirmed Cases Using Deep Learning and AdaBoost-Bi-LSTM Model

    In this study, AdaBoost-Bi-LSTM ensemble models are developed to predict the number of COVID-19 confirmed cases by effectively learning volatile and unstable data using a nonparametric method. The performance of the developed models in terms of prediction accuracy is compared with those of existing deep learning models such as GRU, LSTM, and Bi-LSTM. The COVID-19 outbreak in 2019 has resulted in a global pandemic with a significant number of deaths worldwide. There have long been ongoing efforts to prevent the spread of infectious diseases, and a number of prediction models have been developed for the number of confirmed cases. However, there are many variables that continuously mutate the virus and therefore affect the number of confirmed cases, which makes it difficult to accurately predict the number of COVID-19 confirmed cases. The goal of this study is to develop a model with a lower error rate and higher predictive accuracy than existing models to more effectively monitor and handle endemic diseases. To this end, this study predicts COVID-19 confirmed cases from April to October 2022 based on the analysis of COVID-19 confirmed cases data from 16 December 2020 to 27 September 2022 using the developed models. As a result, the AdaBoost-Bi-LSTM model shows the best performance, even though the data from the period of high variability in the number of confirmed cases was used for model training. The AdaBoost-Bi-LSTM model achieved improved predictive power and shows an increased performance of 17.41% over the simple GRU/LSTM model and of 15.62% over the Bi-LSTM model.

  • articleFree Access

    Using Area Under the Precision Recall Curve to Assess the Effect of Random Undersampling in the Classification of Imbalanced Medicare Big Data

    In this paper, we investigate the impact of Random Undersampling (RUS) on a supervised Machine Learning task involving highly imbalanced Big Data. We present the results of experiments in Medicare Fraud detection. To the best of our knowledge, these experiments are conducted with the largest insurance claims datasets ever used for Medicare Fraud detection. We obtain two datasets from two Big Data repositories provided by the United States government’s Centers for Medicare and Medicaid Services. The larger of the two datasets contains nearly 174 million instances, with a minority to majority class ratio of approximately 0.0039. Our contribution is to show that RUS has a detrimental effect on a Medicare Fraud detection task when performed on large scale, imbalanced data. The effect of RUS is apparent in the Area Under the Precision Recall Curve (AUPRC) scores recorded from experimental outcomes. We use four popular, open-source classifiers in our experiments to confirm the negative impact of RUS on their AUPRC scores.

  • articleOpen Access

    A Cross-Domain Comparative Study of Big Data Architectures

    Nowadays, a variety of Big Data architectures are emerging to organize the Big Data life cycle. While some of these architectures are proposed for general usage, many of them are proposed in a specific application domain such as smart cities, transportation, healthcare, and agriculture. There is, however, a lack of understanding of how and why Big Data architectures vary in different domains and how the Big Data architecture strategy in one domain may possibly advance other domains. Therefore, this paper surveys and compares the Big Data architectures in different application domains. It also chooses a representative architecture of each researched application domain to indicate which Big Data architecture from a given domain the researchers and practitioners may possibly start from. Next, a pairwise cross-domain comparison among the Big Data architectures is presented to outline the similarities and differences between the domain-specific architectures. Finally, the paper provides a set of practical guidelines for Big Data researchers and practitioners to build and improve Big Data architectures based on the knowledge gathered in this study.

  • articleFree Access

    Anti-Fraud Analysis during the COVID-19 Pandemic: A Global Perspective

    The ongoing coronavirus disease 2019 (COVID-19) pandemic has brought unexpected economic downturns and accelerated digital transformation, leading to stronger financial fraud motives and more complicated fraud schemes. Although scholars, practitioners, and regulators have begun to focus on the new characteristics of financial fraud, a systematic and effective anti-fraud strategy during the pandemic still needs to be explored. This paper comprehensively analyzes the lessons of anti-fraud that we should learn from the COVID-19 pandemic. By exploring the complex motives and schemes of fraud, we summarize the characteristics of financial fraud activities and further analyze the regulatory challenges posed by financial fraud during the outbreak. To better cope with the fraudulent activities during the pandemic, policy proposals on how to improve the supervision of financial fraud activities are put forward. In particular, the panoramic data and graph-based techniques are powerful tools for future fraud detection.

  • articleOpen Access

    Compression for population genetic data through finite-state entropy

    We improve the efficiency of population genetic file formats and GWAS computation by leveraging the distribution of samples in population-level genetic data. We identify conditional exchangeability of these data, recommending finite state entropy algorithms as an arithmetic code naturally suited for compression of population genetic data. We show between 10% and 40% speed and size improvements over modern dictionary compression methods that are often used for population genetic data such as Zstd and Zlib in computation and decompression tasks. We provide open source prototype software for multi-phenotype GWAS with finite state entropy compression demonstrating significant space saving and speed comparable to the state-of-the-art.

  • articleFree Access

    A Study of Digital Museum Collection Recommendation Algorithm Based on Improved Fuzzy Clustering Algorithm

    With the rapid advancement of internet technology, various industries have accumulated vast amounts of data, including on user behavior and personal preferences. Traditional museums can leverage this user data to uncover individual preferences and offer personalized services to their visitors. However, the exponential growth of information has also led to the problem of information overload, making it challenging for users to find relevant information within the vast data landscape. Consequently, the utilization rate of available information decreases. By harnessing the power of cloud computing, big data analytics, and recommendation systems, museums can enhance visitors’ touring experiences by helping them discover collections aligned with their interests and connecting with like-minded individuals. To address this objective, the research focuses on optimizing the initial clustering centers of the fuzzy clustering algorithm and parallelizing the optimized algorithm using MapReduce, resulting in the development of a novel MapReduce-based k-prototype fuzzy c-means (MRKPFCM) algorithm. Subsequently, the MRKPFCM algorithm is combined with the classical collaborative filtering algorithm to create a hybrid and parallelized collaborative filtering recommendation algorithm, incorporating elements such as MRKPFCM, audience, and collection. This hybrid algorithm is further supplemented by a content-based recommendation approach to generate comprehensive and refined recommendation results. Experimental findings demonstrate that the predictive scoring errors, as measured by RMSE and MAE, exhibited a downward trend when the number of nearest neighbors for target users fell within the range of 10–20. For instance, the studied algorithm’s MAE value decreased from 0.7512 to 0.7179, surpassing the corresponding figures for the two comparison algorithms. Moreover, with an increase in the number of nearest neighbors within the same range, all three algorithms experienced improved accuracy in prediction results. In particular, the accuracy rate rose from 17.84% to 18.82%, outperforming the two comparison algorithms. In summary, the enhanced hybrid recommendation algorithm achieved through this study displays superior recommendation accuracy and holds significant practical value.

  • articleOpen Access

    Implementing Gene Expression Programming in the Parallel Environment for Big Datasets’ Classification

    The paper investigates a Gene Expression Programming (GEP)-based ensemble classifier constructed using the stacked generalization concept. The classifier has been implemented with a view to enable parallel processing with the use of Spark and SWIM — an open source genetic programming library. The classifier has been validated in computational experiments carried out on benchmark datasets. Also, it has been inbvestigated how the results are influenced by some settings. The paper is an extension of a previous paper of the authors.

  • articleOpen Access

    Integration of Relational and NoSQL Databases

    The analysis of relational and NoSQL databases leads to the conclusion that these data processing systems are to some extent complementary. In the current Big Data applications, especially where extensive analyses (so-called Big Analytics) are needed, it turns out that it is nontrivial to design an infrastructure involving data and software of both types. Unfortunately, the complementarity negatively influences integration possibilities of these data stores both at the data model and data processing levels. In terms of performance, it may be beneficial to use a polyglot persistence, a multimodel approach or multilevel modeling, or even to transform the SQL database schema into NoSQL and to perform data migration between the relational and NoSQL databases. Another possibility is to integrate a NoSQL database and relational database with the help of a third data model. The aim of the paper is to show these possibilities and present some new methods of designing such integrated database architectures.

  • articleOpen Access

    Process, Analyze and Visualize Telecommunication Network Configuration Data in Graph Database

    In network telemetry systems, nodes produce vast number of configuration files based on how they are configured. Steps were taken to process these files into databases to help the work of the developers, testers and customer support to focus on the development and testing and to be able to give advice to the customers about how to configure the nodes. However, the processing of these data in relational database manager system is slow, hard to query and the storage takes huge disk space.

    In this paper, we are presenting a way to store the data produced by these nodes in graph database, changing from relational database to NoSQL environment. With our approach, one can easily represent and visualize the network of machines.

    In the end, we are going to compare the inserting, querying time and storage size in different database manager systems. The results could also be used for other types of configuration data too from other kinds of machines to show the connection between them and query them easily.

  • articleFree Access

    Building Smart Cities in China: Problems and Countermeasures

    This paper argues that although the construction of smart cities in China has followed other countries closely, the popularization of concepts, the construction mechanisms and achievements are still behind the advanced countries. The pilot and demonstration pattern in the Chinese construction of smart cities has several advantages, such as wide coverage of cities, large investment funds, diversity of applications and good coordination among different agents. Utilizing this pattern is conducive to exploring a suitable way to construct China's smart cities, and narrow the gap between China and the advanced countries. The paper summarizes eight barriers in the construction of smart cities in China, and offers some proposals on promoting the levels of construction regarding several aspects, such as forming the construction pattern, expanding the application system, strengthening the investment on R&D, providing the support of the industry, and integrating pilot projects.