Please login to be able to save your searches and receive alerts for new content matching your search criteria.
This paper demonstrates how knowledge can be extracted from evolving spiking neural networks with rank order population coding. Knowledge discovery is a very important feature of intelligent systems. Yet, a disproportionally small amount of research is centered on the issue of knowledge extraction from spiking neural networks which are considered to be the third generation of artificial neural networks. The lack of knowledge representation compatibility is becoming a major detriment to end users of these networks. We show that a high-level knowledge can be obtained from evolving spiking neural networks. More specifically, we propose a method for fuzzy rule extraction from an evolving spiking network with rank order population coding. The proposed method was used for knowledge discovery on two benchmark taste recognition problems where the knowledge learnt by an evolving spiking neural network was extracted in the form of zero-order Takagi-Sugeno fuzzy IF-THEN rules.
This paper presents an efficient algorithm for discovering exception rules from a data set without domain-specific information. An exception rule, which is defined as a deviational pattern to a strong rule, exhibits unexpectedness and is sometimes extremely useful. Previous discovery approaches for this type of knowledge can be classified into a directed approach, which obtains exception rules each of which deviates from a set of user-prespecified strong rules, and an undirected approach, which typically discovers a set of rule pairs each of which represents a pair of an exception rule and its corresponding strong rule. It has been pointed out that unexpectedness is often related to interestingness. In this sense, an undirected approach is promising since its discovery outcome is free from human prejudice and thus tends to be highly unexpected. However, this approach is prohibitive due to extra search for strong rules as well as unreliable patterns in the output. In order to circumvent these difficulties we propose a method based on sound pruning and probabilistic estimation. The sound pruning reduces search time to a reasonable amount, and enables exhaustive search for rule pairs. The normal approximations of the multinomial distributions are employed as the method for evaluating reliability of a rule pair. Our method has been validated using two medical data sets under supervision of a physician and two benchmark data sets in the machine learning community.
This paper addresses the problem of using domain generalization graphs to generalize temporal data extracted from relational databases. A domain generalization graph associated with an attribute defines a partial order which represents a set of generalization relations for the attribute. We propose formal specifications for domain generalization graphs associated with calendar (date and time) attributes. These graphs are reusable (i.e. can be used to generalize any calendar attributes), adaptable (i.e. can be extended or restricted as appropriate for particular applications), and transportable (i.e. can be used with any database containing a calendar attribute).
The prevention and control of communicable diseases such as COVID-19 has been a worldwide problem, especially in terms of mining towards latent spreading paths. Although some communication models have been proposed from the perspective of spreading mechanism, it remains hard to describe spreading mechanism anytime. Because real-world communication scenarios of disease spreading are always dynamic, which cannot be described by time-invariant model parameters, to remedy such gap, this paper explores the utilization of big data analysis into this area, so as to replace mechanism-driven methods with big data-driven methods. In modern society with high digital level, the increasingly growing amount of data in various fields also provide much convenience for this purpose. Therefore, this paper proposes an intelligent knowledge discovery method for critical spreading paths based on epidemic big data. For the major roadmap, a directional acyclic graph of epidemic spread was constructed with each province and city in mainland China as nodes, all features of the same node are dimension-reduced, and a composite score is evaluated for each city per day by processing the features after principal component analysis. Then, the typical machine learning model named XGBoost carries out processing of feature importance ranking to discriminate latent candidate spreading paths. Finally, the shortest path algorithm is used as the basis to find the critical path of epidemic spreading between two nodes. Besides, some simulative experiments are implemented with use of realistic social network data.
Ethnic minority resources are very rich and contain rich historical resources and culture. Under the impact of modern information technology, the development of minority resources and the inheritance of ethnic culture are facing many challenges, and the current school education is lagging behind in exploring the ecological resources of minority groups, which makes the integration of creative education and minority groups encounter a bottleneck. In response to this situation, we make full use of the platform of creative education to actively explore the traditional skills contained in the lives of ethnic minorities. In the evaluation of creative education, we should not only focus on the evaluation of students’ works, but also on the improvement of students’ knowledge of various creative tools, their ability to use comprehensive subject knowledge, hands-on ability, solution ability and creativity ability during the whole learning process. Based on this, this paper proposes a back propagation neural network (BPNN)-based quality evaluation method for creative education to evaluate the quality of creative education from multiple dimensions. Experiments and comparisons show that the BPNN-based evaluation method proposed in this paper can better evaluate the whole process of creative education and help the further development of creative education in minority regions.
Marine pollution incidents (MPI) are often a dynamic process of time and space interaction. Currently, the monitoring of MPI is basically realized by manual analysis from expert experience. Such working mode has an obvious time lag, and is not useful for timely disposal. As a result, intelligent algorithms that can make quick discovery for MPI from massive monitoring data remain a practical demand in this field. Considering that monitoring elements generally have multi-dimensional characteristics and spatiotemporal causal relationships, this work develops a spatiotemporal deep learning-based smart discovery approach for MPI from the data-driven perspective. In particular, a systematic preprocessing workflow is developed for the spatiotemporal monitoring data, which facilitates following feature extraction. Then, a spatiotemporal convolution neural network structure is developed to extract features from original spatiotemporal monitoring data. On this basis, the discovery results of MPI can be output via neural computing structures. Taking the polluting marine oil spill incident in the Bohai Sea in eastern China as a case study, this work carries out a simulation application and its result analysis. The obtained simulation results can reveal the proper performance of the proposal.
Data mining techniques provide people with new power to research and manipulate the existing large volume of data. A data mining process discovers interesting information from the hidden data that can either be used for future prediction and/or intelligently summarising the details of the data. There are many achievements of applying data mining techniques to various areas such as marketing, medical, and financial, although few of them can be currently seen in software engineering domain. In this paper, a proposed data mining application in software engineering domain is explained and experimented. The empirical results demonstrate the capability of data mining techniques in software engineering domain and the potential benefits in applying data mining to this area.
"Knowledge discovery in data bases" (KDD) for software engineering is a process for finding useful information in the large volumes of data that are a byproduct of software development, such as data bases for configuration management and for problem reporting. This paper presents guidelines for extracting innovative process metrics from these commonly available data bases. This paper also adapts the Classification And Regression Trees algorithm, CART, to the KDD process for software engineering data. To our knowledge, this algorithm has not been used previously for empirical software quality modeling. In particular, we present an innovative way to control the balance between misclassification rates. A KDD case study of a very large legacy telecommunications software system found that variables derived from source code, configuration management transactions, and problem reporting transactions can be useful predictors of software quality. The KDD process discovered that for this software development environment, out of forty software attributes, only a few of the predictor variables were significant. This resulted in a model that predicts whether modules are likely to have faults discovered by customers. Software developers need such predictions early in development to target software enhancement techniques to the modules that need improvement the most.
Hierarchical conceptual clustering has proven to be a useful, although greatly under-explored data mining technique. A graph-based representation of structural information combined with a substructure discovery technique has been shown to be successful in knowledge discovery. The SUBDUE substructure discovery system provides the advantages of both approaches. This work presents SUBDUE and the development of its clustering functionalities. Several examples are used to illustrate the validity of the approach both in structured and unstructured domains, as well as compare SUBDUE to earlier clustering algorithms. Results show that SUBDUE successfully discovers hierarchical clusterings in both structured and unstructured data.
Extensive amounts of knowledge and data stored in medical databases require the development of specialized tools for storing, accessing, analysis, and effectiveness usage of stored knowledge and data. Intelligent methods such as neural networks, fuzzy sets, decision trees, and expert systems are, slowly but steadily, applied in the medical fields. Recently, rough set theory is a new intelligent technique was used for the discovery of data dependencies, data reduction, approximate set classification, and rule induction from databases.
In this paper, we present a rough set method for generating classification rules from a set of observed 360 samples of the breast cancer data. The attributes are selected, normalized and then the rough set dependency rules are generated directly from the real value attribute vector. Then the rough set reduction technique is applied to find all reducts of the data which contains the minimal subset of attributes that are associated with a class label for classification. Experimental results from applying the rough set analysis to the set of data samples are given and evaluated. In addition, the generated rules are also compared to the well-known ID3 classifier algorithm. The study showed that the theory of rough sets seems to be a useful tool for inductive learning and a valuable aid for building expert systems.
This paper describes an automated query discovery system for retrieving common characteristic knowledge from a database in a distributed computing environment. The paper particularly centers on the problem of discovering the common characteristics that are shared by a set of objects in a database. This type of commonalities can be useful in finding a typical profile for the given object set or outstanding features for a group of objects in a database. In our approach, commonalities within a set of objects are described by database queries that compute the given set of objects. We use the genetic programming as a major search engine to discover such queries. The paper discusses the architecture and the techniques used in our system, and presents some experimental results to evaluate the system. In addition, for the performance improvement, we built a distributed computing environment for our system with clustered computers using the Common Object Request Broker Architecture (CORBA). The paper briefly discusses our clustered computer architecture, the implementation of distributed computing environment, and shows the overall performance improvement.
This work focuses on learning the structure of Markov networks from data. Markov networks are parametric models for compactly representing complex probability distributions. These models are composed by: a structure and numerical weights, where the structure describes independences that hold in the distribution. Depending on which is the goal of structure learning, learning algorithms can be divided into: density estimation algorithms, where structure is learned for answering inference queries; and knowledge discovery algorithms, where structure is learned for describing independences qualitatively. The latter algorithms present an important limitation for describing independences because they use a single graph; a coarse grain structure representation which cannot represent flexible independences. For instance, context-specific independences cannot be described by a single graph. To overcome this limitation, this work proposes a new alternative representation named canonical model as well as the CSPC algorithm; a novel knowledge discovery algorithm for learning canonical models by using context-specific independences as constraints. On an extensive empirical evaluation, CSPC learns more accurate structures than state-of-the-art density estimation and knowledge discovery algorithms. Moreover, for answering inference queries, our approach obtains competitive results against density estimation algorithms, significantly outperforming knowledge discovery algorithms.
We present the three-step GRG approach for learning decision rules from large relational databases. In the first step, an attribute-oriented concept tree ascen sion technique is applied to generalize an information system. This step loses some information but substantially improves the efficiency of the following steps. In the second step, a reduction technique is applied to generate a minimalized information system called a reduct which contains a minimal subset of the generalized attributes and the smallest number of distinct tuples for those attributes. Finally, a set of maximally general rules are derived directly from the reduct. These rules can be used to interpret and understand the active mechanisms underlying the database.
Update of the single- and multi-level association rules discovered in large databases is inherently costly. The straight forward approach of re-running the discovery algorithm on the entire updated database to re-discover the association rules is not cost-effective. An incremental algorithm FUP have been proposed for the update of discovered single-level association rules. In this study, we have shown that the incremental technique in FUP can be generalized to other data mining systems. An efficient algorithm MLUp has been proposed for the updating of discovered multi-level association rules. Our performance study shows that MLUp has a superior performance over the representative mining algorithm such as ML-T2 in updating discovered multi-level association rules.
We propose the share-confidence framework for knowledge discovery from databases which addresses the problem of mining characterized association rules from market basket data (i.e., itemsets). Our goal is to not only discover the buying patterns of customers, but also to discover customer profiles by partitioning customers into distinct classes. We present a new algorithm for classifying itemsets based upon characteristic attributes extracted from census or lifestyle data. Our algorithm combines the A priori algorithm for discovering association rules between items in large databases, and the A O G algorithm for attribute-oriented generalization in large databases. We show how characterized itemsets can be generalized according to concept hierarchies associated with the characteristic attributes. Finally, we present experimental results that demonstrate the utility of the share-confidence framework.
The explosive growth in the generation and collection of data has generated an urgent need for a new generation of techniques and tools that can assist in transforming these data intelligently and automatically into useful knowledge. Knowledge discovery is an emerging multidisciplinary field that attempts to fulfill this need. Knowledge discovery is a large process that includes data selection, cleaning, preprocessing, integration, transformation and reduction, data mining, model selection, evaluation and interpretation, and finally consolidation and use of the extracted knowledge. This paper addresses the issues of data cleaning and integration for knowledge discovery by proposing a systematic approach for resolving semantic conflicts that are encountered during the integration of data from multiple sources. Illustrated with examples derived from military databases, the paper presents a heuristics-based algorithm for identifying and resolving semantic conflicts at different levels of information granularity.
Intelligent and Cooperative Information Systems (ICIS) will have large numbers of distributed, heterogeneous agents interacting and cooperating to solve problems regardless of location, original mission, or platform. The agents in an ICIS will adapt to new and possibly surprising situations, preferably without human intervention. These systems will not only control a domain, but also will improve their own performance over time, that is, they will learn.
This paper describes five heterogeneous learning agents and how they are integrated into an Integrated Learning System (ILS) where some of the agents cooperate to improve performance. The issues involve coordinating distributed, cooperating, heterogeneous problem-solvers, combining various learning paradigms, and integrating different reasoning techniques. ILS also includes a central controller, called The Learning Coordinator (TLC), that manages the control of flow and communication among the agents, using a high-level communication protocol. In order to demonstrate the generality of the ILS architecture, we implemented an application which, through its own experience, learns how to control the traffic in a telephone network, and show the results for one set of experiments. Options for enhancements of the ILS architecture are also discussed.
This paper deals with the extraction of default rules from a database of examples. The proposed approach is based on a special kind of probability distributions, called "big-stepped probabilities", which are known to provide a semantics for non-monotonic reasoning. The rules which are learnt are genuine default rules, which could be used (under some conditions) in a non-monotonic reasoning system and can be encoded in possibilistic logic.
Many knowledge discovery tasks consist in mining databases. Nevertheless, there are cases in which a user is not allowed to access the database and can deal only with a provided fraction of knowledge. Still, the user hopes to find new interesting relationships. Surprisingly, a small number of patterns can be augmented into new knowledge so considerably that its analysis may become infeasible. In the article, we offer a method of inferring the concise lossless and sound representation of association rules in the form of maximal covering rules from a concise lossless representation of all derivable patterns. The respective algorithm is offered as well.
In our study we presented an effective method for clustering of Web pages. From flat HTML files we extracted keywords, formed feature vectors as representation of Web pages and applied them to a clustering method. We took advantage of the Fuzzy C-Means clustering algorithm (FCM). We demonstrated an organized and schematic manner of data collection. Various categories of Web pages were retrieved from ODP (Open Directory Project) in order to create our datasets. The results of clustering proved that the method performs well for all datasets. Finally, we presented a comprehensive experimental study examining: the behavior of the algorithm for different input parameters, internal structure of datasets and classification experiments.