Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Port state control (PSC) inspection contributes a lot to improving maritime safety and protecting the marine environment. After selecting the ships coming to a port for inspection, one critical challenge faced by the PSC authorities is deciding what deficiency items should be inspected and what the inspection sequence of these items is. To address this problem, two innovative and high-efficient PSC inspection schemes describing specific PSC inspection items and sequence are proposed for the inspectors’ reference when time and resources are limited, especially when there are difficulties in estimating the possible deficiencies in advance. Both schemes take the occurrence probability, inspection cost, and ignoring loss of each deficiency item into account. More specifically, the first inspection scheme is based on the occurrence probabilities of the deficiency items in the whole data set, while the second scheme further considers the correlations among the deficiency items extracted by association rules. The results of numerical experiments show that the efficiency of the two proposed inspection schemes is 1.5 times higher than that of the currently used inspection scheme. In addition, the second inspection scheme performs better than the first inspection scheme, especially with inspecting ships with no less than five deficiency items and limited inspection resources.
In order to mine association rules accurately and efficiently while preserving the privacy thereof, a novel privacy-preserving association rules mining method is proposed in this paper. Known as the partial randomized response based on probability matrix, or PRRPM, this method chooses different data transition strategies to find frequent 1-itemsets and k-itemsets (k > 1). The PRRPM algorithm is explored and its validity examined through theoretical analysis and experiments.
FP-Growth algorithm is an algorithm of association rules that does not generate a set of candidate, so it has very high practical value in face of the rapid growth of data volume in wisdom medical treatment. Because FP-Growth is a memory-resident algorithm, it will appear to be powerless when it is used for massive data sets. The paper combines Hadoop and FP-Growth algorithm and through the actual analysis of traditional Chinese medicine (TCM) data compares the performance in two different environments of stand-alone and distributed. The experimental results show that FP-Growth algorithm has a great advantage in the processing and execution of massive data after the MapReduce parallel model, so that it will have better development prospects for intelligent medical treatment.
Associative classification (AC) is an important data mining approach which effectively integrates association rule mining and classification. Prediction of test data is a fundamental step in classification that impacts the outputted system accuracy. In this paper, we present three new prediction methods (Dominant Class Label, Highest Average Confidence per Class, Full Match Rule) and one rule pruning procedure (Partial Matching) in AC. Furthermore, we review current prediction methods in AC.
Experimental results on large English and Arabic text categorisation data collections (Reuters, SPA) using the proposed prediction methods and other popular classification algorithms (SVM, KNN, NB, BCAR, MCAR, C4.5, etc.), have been conducted. The bases of the comparison in the experiments are classification accuracy and the Break-Even-Point (BEP) evaluation measures. The results reveal that our prediction methods are very competitive with reference to BEP if compared with known AC prediction approaches such as those of 2-PS, ARC-BC and BCAR. Moreover, the proposed prediction methods outperform other existing methods in traditional classification approaches such as decision trees, and probabilistic with regards to accuracy. Finally, the results indicate that using the proposed pruning procedure in AC improved the accuracy of the outputted classifier.
Because of the inherent computational complexity, mining the complete frequent item-set in dense datasets remains to be a challenging task. Mining Maximal Frequent Item-set (MFI) is an alternative to address the problem. Set-Enumeration Tree (SET) is a common data structure used in several MFI mining algorithms. For this kind of algorithm, the process of mining MFI's can also be viewed as the process of searching in set-enumeration tree. To reduce the search space, in this paper, a new algorithm, Index-MaxMiner, for mining MFI is proposed by employing a hybrid search strategy blending breadth-first and depth-first. Firstly, the index array is proposed, and based on bitmap, an algorithm for computing index array is presented. By adding subsume index to frequent items, Index-MaxMiner discovers the candidate MFI's using breadth-first search at one time, which avoids first-level nodes that would not participate in the answer set and reduces drastically the number of candidate itemsets. Then, for candidate MFI's, depth-first search strategy is used to generate all MFI's. Thus, the jumping search in SET is implemented, and the search space is reduced greatly. The experimental results show that the proposed algorithm is efficient especially for dense datasets.
Most pattern mining techniques almost singularly focus on identifying frequent patterns and very less attention has been paid to the generation of rare patterns. However, in several domains, recognizing less frequent but strongly related patterns have greater advantage over the former ones. Identification of compelling and meaningful rare associations among such patterns may proved to be significant for air quality management that has become an indispensable task in today’s world. The rare correlations between air pollutants and other parameters may aid in restricting the air pollution to a manageable level. To this end, efficient and competent rare pattern mining techniques are needed that can generate the complete set of rare patterns, further identifying significant rare association rules among them. Moreover, a notable issue with databases is their continuous update over time due to the addition of new records. The users requirement or behavior may change with the incremental update of databases that makes it difficult to determine a suitable support threshold for the extraction of interesting rare association rules. This paper, presents an efficient rare pattern mining technique to capture the complete set of rare patterns from a real environmental dataset. The proposed approach does not restart the entire mining process upon threshold update and generates the complete set of rare association rules in a single database scan. It can effectively perform incremental mining and also provides flexibility to the user to regulate the value of support threshold for generating the rare patterns. Significant rare association rules representing correlations between air pollutants and other environmental parameters are further extracted from the generated rare patterns to identify the substantial causes of air pollution. Performance analysis shows that the proposed method is more efficient than existing rare pattern mining approaches in providing significant directions to the domain experts for air pollution monitoring.
Data mining is the process of extracting desirable knowledge or interesting patterns from existing databases for specific purposes. Most conventional data-mining algorithms identify the relationships among transactions using binary values. Transactions with quantitative values are however commonly seen in real-world applications. We proposed a fuzzy mining algorithm by which each attribute used only the linguistic term with the maximum cardinality int he mining process. The number of items was thus the same as that of the original attributes, making the processing time reduced. The fuzzy association rules derived in this way are not complete. This paper thus modifies it and proposes a new fuzzy data-mining algorithm for extrating interesting knowledge from transactions stored as quantitative values. The proposed algorithm can derive a more complete set of rules but with more computation time than the method proposed. Trade-off thus exists between the computation time and the completeness of rules. Choosing an appropriate learning method thus depends on the requirement of the application domains.
Medical diagnosis is mostly done by experienced doctors. However, still some of the cases reported of wrong diagnosis and treatment. Patients are needed to take number of clinical tests for disease diagnosis. Most of the cases, all the tests are not contributing towards efficient diagnosis. The medical data are multidimensional and composed of thousands of independent features. So, the multidimensional database need to be analyzed and preprocessed for valuable decision making for medical diagnosis. The aim of this work is to accurately predict the medical disease with a condensed number of attributes. In this approach, the raw input dataset is preprocessed based on the common normalization approach. An association rule is used to find out the frequent used patterns to prune the dataset. Further, base rule can be applied to the pruned dataset. The Payoff and Heuristic rate can be evaluated to predict the risk analysis. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) approaches are used for better feature selection. Classification result is acquired based on minimum and maximum of residual support values. The experimental results show that the proposed scheme, can perform better than the existing algorithms to diagnose the medical disease.
Data mining techniques has been used as a recent trend for achieving diagnostics results, especially in medical fields such as kidney dialysis, skin cancer and breast cancer detection, and also biological sequences classification. Due to its ability to discover the relationship and pattern of the medical database, early detection or prediction of pathological conditions through mining has become feasible. This paper discusses the data mining approach for parametric evaluation to improve the treatment of kidney dialysis patient. The experimental result shows that classification accuracy using Association mining between the ranges 50–97.7% is obtained based on the dialysis parameter combination. Such a decision-based approach helps the clinician to decide the level of dialysis required for individual patient.
Association rule mining is an important research topic in data mining. Association rule mining consists of two steps: finding frequent itemsets and then extracting interesting rules from the frequent itemsets. In the first step, efficiency is important since discovering frequent itemsets is computationally time consuming. In the second step, unbiased assessment is important for good decision making.
In this paper, we deal with both the efficiency of the mining algorithm and the measure of interest of the resulting rules. First, we present an algorithm for finding frequent itemsets that uses a vertical database. We also introduce a modified vertical data format to reduce the size of the database and an itemset reordering strategy to reduce the size of the intermediate tidsets. Second, we present a new measure to evaluate the interest of the resulting association rules. Our performance analysis shows that our proposed algorithm reduces the size of the intermediate tidsets that are generated during the mining process. The smaller tidsets make intersection operations faster. Using our interest-measuring test helps to avoid the discovery of misleading rules.
Associative classification is a branch in data mining that employs association rule discovery methods in classification problems. In this paper, we introduce a novel data mining method called Looking at the Class (LC), which can be utilised in associative classification approach. Unlike known algorithms in associative classification such as Classification based on Association rule (CBA), which combine disjoint itemsets regardless of their class labels in the training phase, our method joins only itemsets with similar class labels. This saves too many unnecessary itemsets combining during the learning step, and consequently results in massive saving in computational time and memory. Moreover, a new prediction method that utilises multiple rules to make the prediction decision is also developed in this paper. The experimental results on different UCI datasets reveal that LC algorithm outperformed CBA with respect to classification accuracy, memory usage, and execution time on most datasets we consider.
As new generations of technology appear, legacy knowledge management solutions and applications become increasingly out of date, necessitating a paradigm shift. Machine learning presents an opportunity by foregoing rule-based knowledge intensive systems inundating the marketplace. An extensive review was made on the literature pertaining to machine learning which common machine learning algorithms were identified. This study has analysed more than 200 papers extracted from Scopus and IEEE databases. Searches ranged with the bulk of the articles from 2018 to 2021, while some articles ranged from 1959 to 2017. The research gap focusses on implementing machine learning algorithm to knowledge management systems, specifically knowledge management attributes. By investigating and reviewing each algorithm extensively, the usability of each algorithm is identified, with its advantages and disadvantages. From there onwards, these algorithms were mapped for what area of knowledge management it may be beneficial. Based on the findings, it is evidently seen how these algorithms are applicable in knowledge management and how it can enhance knowledge management system further. Based on the findings, the paper aims to bridge the gap between the literature in knowledge management and machine learning. A knowledge management–machine learning framework is conceived based on the review done on each algorithm earlier and to bridge the gap between the two literatures. The framework highlights how machine learning algorithm can play a part in different areas of knowledge management. From the framework, it provides practitioners how and where to implement machine learning in knowledge management.
The existing default discrimination models based on evaluation indicators are difficult to achieve higher credit risk identification performance of farmers’ default status under the situation of insufficient credit information and low correlation between indicators and default risk. Those models are difficult to find out the fundamental causes of farmers’ default risk. A credit risk discrimination model based on credit features strongly with default status is established to evaluate the farmer’s credit risk. Term frequency inverse document frequency and sentiment dictionary analysis method are used to quantify long text indicators, then the K-means method is used to Boolean the numerical data. The APRIORI algorithm is used to mine the credit features strongly associated with the default status. Finally, the default status of farmers is judged based on those credit features. The model is detailed using actual bank data from 2044 farmers within China. According to the five-evaluation criterion of AUC, F1-score, Type II-error, Balance error rate and G-mean, the empirical results show that the ability of the credit risk discrimination model with credit features is higher than that of the model based on evaluation indicators. This finding provides a new idea for commercial banks to measure the default risk of farmers, and provides a reference for the formulation of strategies to enhance farmers’ credit.
This chapter introduces the concept of association rules, a form of local-pattern discovery in an unsupervised learning system. Association rules are used to uncover relationships between inherently unrelated data items. The terminology, notation, and the processes used with association rules are discussed, and a brief overview of three basic rule inference algorithms is given.
Query evaluation improvement and association rules are two interesting research topics in data query and management. In this paper, the relations are decomposed with respect to the mined association rules, and several basic data query expressions are rewrote which have relatively less time cost. This is desirable for processing queries in an efficient manner.
Since medical insurance is relevant for everyone, it is very important to know how to study the formulation and implement the system. In this paper, I propose a text association analysis method, TACMA, which is based on medical insurance policies by combining the text information of medical insurance policy with the characteristics of semantic relations of the audit knowledge. Compared with the traditional TP-growth algorithm, the TACMA algorithm has improved significantly in mining effectiveness and time efficiency in text knowledge discovery, and can be successfully applied to practical projects.
With the widespread and rapid development of the wearable device in our daily life, it has brought a huge change to our life and perception. Nowadays, the major types of the wearable devices are the products like wristband, watch, glass and etc. These wearable devices can be used to record the personal healthy data of personal sleep quality, sports, blood pressure and blood glucose. While with the constant enlargement of the amount of data, which is complicated too, the handling and analysis to the data become more important. But the data mining and visualization technology can provide method and path for the data processing of the wearable devices.
If these data can be sufficiently used, valuable information and content can be mined, and then show these data by using the visualization technology, in this way, these data’s core value can be vividly showed. This article will look into the future of the wearable device and data mining of visualization technology through the analysis of the current situation of wearable device, technology of data mining and data visualization.
The association rule mining is an important topic in the data mining. Association rule mining aims to find rules in the transaction database with the minimum support and minimum confidence which are user given. Apriori algorithm is one of the most influential algorithms for mining Boolean association rules, but it is ineffective because of large candidates and ineffective calculation for support. In order to find all the frequent item sets from the transaction database efficiently and quickly, an improved Apriori algorithm is presented to solve the bottleneck problems of the traditional Apriori algorithm. First of all, frequent item format is necessary, aiming to reduce the rate of memory occupancy in the process of frequent item generation, and then convert the storage structure of transaction records. Finally, it divides frequent 1-item into different groups in order to implement parallel computing. The experiment results show the improved Apriori algorithm can improve the efficiency of the original algorithm effectively.