Please login to be able to save your searches and receive alerts for new content matching your search criteria.
A user click fraud detection method based on Top-Rank-k frequent pattern mining algorithm is presented to solve the click fraud problem appearing in current online advertising. Firstly, this method combines the click frequency of event samples, calculates the real evaluation score of click stream, and the click stream density function and evaluation score expression under multi-dimensional variables, and further obtains the time complexity of the next user’s click fraud process. Secondly, according to the Top-Rank-k frequent pattern, the process of click fraud detection algorithm is designed, and the click fraud user is analyzed and obtained. The results show that this method has good efficiency and correctness, and is superior to other similar algorithms.
Along with the fast development in information and communication technology (ICT), job skills required by ICT industries are also evolving very rapidly. It becomes difficult for ICT students to assess the gap between their skills and such evolving skills. Even though schools perform periodical curriculum evaluations, the time gap between the evaluations causes the curriculum to get out-of-date easily since it is unable to cope with the tremendous and quick changes occurring in the industry. We propose novel solutions by introducing some measures and visualization tools to reveal such skills’ gap. Using evolutionary-based data mining, the skillsets mastered by students were collected from their study reports, while the frequent skillsets required by the industry were mined out from job adverts; and based on these skillsets the skill coverage of the students was approximated. The proposed solutions were then tested on data obtained from an Indonesian higher education institution since Indonesia implements competence-based curriculum in its education system. Experimental works show that the proposed approaches not only reveal and visualize the gap, but also monitor the changes in the skills requirements, which also help the school’s administrator while updating the curriculum.
Recently a new type of data source came into the focus of knowledge discovery from temporal data: interval sequences. In contrast to event sequences, interval sequences contain labeled events with a temporal extension. However, existing algorithms for mining patterns from interval sequences proved to be far from satisfying our needs. In brief, we missed an approach that, at the same time, defines support as the number of pattern instances, allows input data that consists of more than one sequence, implements time constraints on a pattern instance, and counts multiple instances of a pattern within one interval sequence. In this paper we propose a new support definition which incorporates these properties. We also describe FSMSet, an algorithm that employs the new support definition, and demonstrate its performance on field data from the automotive business.
For mining frequent patterns, it is very expensive for the Apriori mining model to read the database repeatedly, and a highly condensed data structure made the FP-growth mining model cost larger memory. In order to avoid the disadvantages of these data mining model, this paper proposes a novel data mining model for discovering frequent patterns, called a data mining model based on embedded granular computing, which is different from the Apriori model and the FP-growth model. The data mining model adopts efficiently dividing and conquering from granular computing, which can construct adaptively different hierarchical granules. To form the data mining model, an embedded granular computing model is proposed in this paper. The granular computing model is used in discovering frequent patterns, on the one hand, it avoids reading the database repeatedly via constructing the extended information granule, and lessen the calculated amount of support; on the other hand, it reduces the memory requirements by the attribute granule, where the search space can compress the memory space of data structure that make the method of generating the candidate become simple relatively; and it can divide the overlarge computing task into several easy operations via the attribute granule, namely, the embedded granular computing model could short the size of the search space from a super state to several sub-states. All experimental results show that the data mining model based on embedded granular computing is more reasonable and efficient than these classical models for mining frequent patterns under these different types of datasets. Otherwise, an extra discussion describes the performance trend of the model by a group of experiments.
Trajectory Data have been considered as a treasure for various hidden patterns which provide deeper understanding of the underlying moving objects. Several studies are focused to extract repetitive, frequent and group patterns. Conventional algorithms defined for Sequential Patterns Mining problems are not directly applicable for trajectory data. Space Partitioning strategies were proposed to capture space proximity first and then time proximity to discover the knowledge in the data. Our proposal addresses time proximity first by identifying trajectories which meet at a minimum of K time stamps in sequence. A novel tree structure is proposed to ease the process. Our method investigates space proximity using Mahalanobis distance (MD). We have used the Manhattan distance to form prior knowledge that helps the supervised learning-based MD to derive the clusters of trajectories along the true spreads of the objects. With the help of minsup threshold, clusters of frequent trajectories are found and then in sequence they form K length Sequential Patterns. Illustrative examples are provided to compare the MD metric with Euclidean distance metric, Synthetic dataset is generated and results are presented considering the various parameters such as number of objects, minsup, K value, number of hops in any trajectory and computational time. Experiments are done on available real-time dataset, taxi dataset, too. Sequential Patterns are proved to be worthy of knowledge to understand dynamics of the moving objects and to recommend the movements in constrained networks.
Periodic frequent patterns are frequent patterns which occur at periodic intervals in databases. They are useful in decision making where event occurrence intervals are vital. Traditional algorithms for discovering periodic frequent patterns, however, often report a large number of such patterns, most of which are often redundant as their periodic occurrences can be derived from other periodic frequent patterns. Using such redundant periodic frequent patterns in decision making would often be detrimental, if not trivial. This paper addresses the challenge of eliminating redundant periodic frequent patterns by employing the concept of deduction rules in mining and reporting only the set of non-redundant periodic frequent patterns. It subsequently proposes and develops a Non-redundant Periodic Frequent Pattern Miner (NPFPM) to achieve this purpose. Experimental analysis on benchmark datasets shows that NPFPM is efficient and can effectively prune the set of redundant periodic frequent patterns.