Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Considering the decreasing performance of robotic systems under the constraints of actuators, this paper concludes the rules of dynamic control process and extracts the knowledge of optimizing the output of robotic controller based on the analysis of a Lyapunov function. Then fuzzy rules are used to express the knowledge and embedded in the controller to direct the dynamic control process. Under this controller, the systemic requirement of high-level outputs of the actuators is limited to a great extent, at the same time the convergent performance is optimized by a fuzzy-sets method from which the saturated fault tolerant control is realized in robotic manipulators. Simulating results on a 2-DOF robot validate the effectiveness of the given controller.
There are rich data resources residing in available materials websites, and most of these data resources are shown in the form of HTML tables. However, it is difficult to distinguish the attributes and values because of the semi-structured feature of HTML tables. Therefore, identifying attributes in HTML tables is the key issue for the information acquisition. In this paper, based on sibling comparison, a method for materials knowledge extraction from HTML tables is proposed, which consists of three steps: acquiring sibling tables, identifying table pattern and extracting table data. We show how to use F-measure to find the appropriate thresholds for matching of tables from materials websites when acquiring sibling tables. Further, we propose a strategy named FRFC (i.e. the First Row matching and First Column matching) to distinguish attributes and values, so that table pattern is identified. Moreover, the data from HTML tables is extracted based on their corresponding table patterns and mapped to a predefined schema, which will facilitate the population to materials ontology. The proposed approach is applicable to circumstances, where an attribute in the table may span multiple cells and matched attributes in sibling tables are more. We acquire desired accuracy (>90%) through using FRFC for identifying table pattern. The time about extraction may not increase significantly with increasing number of documents and cells in tables, so our approach is effective to process a large number of documents. A prototype named MTES is developed and demonstrates the effectiveness of our proposed approach.
In this paper we describe an approach to define flexible Knowledge Extraction Systems able to deal with the inherent vagueness and uncertainty of the Extraction process. We also present a short survey of fuzzy and semantic approaches to knowledge extraction. The goal of such approaches is to address if and how some approaches met their goal.
It is widely known that search engines are the dominating tools for finding information on the web. In most of the cases, these engines return web page references on a global ranking taking in mind either the importance of the web site or the relevance of the web pages to the identified topic. In this paper, we focus on the problem of determining distinct thematic groups on web search engine results that other existing engines provide. We additionally address the problem of dynamically adapting their ranking according to user selections, incorporating user judgments as implicitly registered in their selection of relevant documents. Our system exploits a state of the art semantic web data mining technique that identifies semantic entities of Wikipedia for grouping the result set in different topic groups, according to the various meanings of the provided query. Moreover, we propose a novel probabilistic Network scheme that employs the aforementioned topic identification method, in order to modify ranking of results as the users select documents. We evaluated in practice our implemented prototype with extensive experiments with the ClueWeb09 dataset using the TREC’s 2009, 2010, 2011 and 2012 Web Tracks’ where we observed improved retrieval performance compared to current state of the art re-ranking methods.
This paper describes a new system for semi-automatically building, extending and managing a terminological thesaurus — a multilingual terminology dictionary enriched with relationships between the terms themselves to form a thesaurus. The system allows to radically enhance the workow of current terminology expert groups, where most of the editing decisions still come from introspection. The presented system supplements the lexicographic process with natural language processing techniques, which are seamlessly integrated to the thesaurus editing environment. The system’s methodology and the resulting thesaurus are closely connected to new domain corpora in the six languages involved. They are used for term usage examples as well as for the automatic extraction of new candidate terms. The terminological thesaurus is now accessible via a web-based application, which (a) presents rich detailed information on each term, (b) visualizes term relations, and (c) displays real-life usage examples of the term in the domain-related documents and in the context-based similar terms. Furthermore, the specialized corpora are used to detect candidate translations of terms from the central language (Czech) to the other languages (English, French, German, Russian and Slovak) as well as to detect broader Czech terms, which help to place new terms in the actual thesaurus hierarchy.
This project has been realized as a terminological thesaurus of land surveying, but the presented tools and methodology are reusable for other terminology domains.
A considerable part of social network analysis literature is dedicated to determining which individuals are to be considered as influential in particular social settings. Concretely, Social Influence can be described as the power or even the ability of a person to yet influence the thoughts as well as the actions of other users. So, User Influence stands as a value that depends on the interest of the followers of a concrete user (via retweets, replies, mentions, favorites, etc.). This paper focuses on identifying such phenomena on the Twitter graph and on presenting a novel methodology for characterizing Twitter Influential Users. The novelty of our approach lies in the fact that we have incorporated a set of features for characterizing social media authors, including both nodal and topical metrics, along with new features concerning temporal aspects of user participation on the topic. We have also implemented cluster-based fusion techniques in order to retrieve result lists for the ranking of top influential users. Hence, results show that the proposed implementations and methodology can assist in identifying influential users, that play a dominant role in information diffusion.
Does a post with specific emotional content that is posted on Twitter by an influential user have the capability to affect and even alter the opinions of those who read it? Accordingly, “influential” users affected by this post can then affect their followers so that eventually a large number of users may change their opinions about the subject the aforementioned post was made on? Social Influence can be described as the power or even the ability of a person to yet influence the thoughts and actions of other users. So, User Influence stands as a value that depends on the interest of the followers (via replies, mentions, retweets, favorites). Our study focuses on identifying such phenomena on the Twitter graph of posts and on determining which users’ posts can trigger them. Furthermore, we analyze the Influence Metrics of all users taking part in specific discussions and verify the differences among them. Finally the percentage of Graph cover when the diffusion starts from the “influential” users, is measured and corresponding results are extracted. Hence, results show that the proposed implementations and methodology can assist in identifying “influential” users, that play a dominant role in information diffusion.
Many websites are in general poorly defined and its users are not able to find the information they need. That is the reason why many papers are addressed to propose techniques able to find the right information for a user. Most of these techniques focus on finding the required information in the whole Internet. Many times the owner of the website gives incomplete/imprecise information with low level of usefulness for the user. The re-structuring of the information is many times enough for detecting lacks of information, inconsistencies and imprecisions. However this work is normally very difficult without losing performances of the website. The authors have developed a novel application to exploit existing information in a website in a more profitable way restructuring the information without the intervention of the content provider. This paper describes the authors' experience during their participation in the European Commission ESPRIT 29158 FLEX Project.
Translation selection is a process to select, from a set of target language words corresponding to a source language word, the most appropriate one that conveys the correct sense of a source word and makes the target language sentence more natural. In this paper, we propose a hybrid method for translation selection that exploits a bilingual dictionary and a target language corpus. Based on the 'word-to-sense and sense-to-word' relationship between a source word and its translations, our method selects translation through two levels: sense disambiguation of a source word and selection of a target word. For translation selection, we introduce three measures: sense preference and sense probability for sense disambiguation, and word probability for word selection. The first one is based on knowledge from a bilingual dictionary, and the others are calculated using statistics from a target language corpus. We evaluated our method and results showed that our method selects more appropriate target words with knowledge extracted from easily obtainable resources.
A novel domain-independent approach to technology trend monitoring is presented in the paper. It is based on the ontology of a technology trend, hype cycles methodology, and semantic indicators which provide evidence of a maturity level of a technology. This approach forms the basis for implementation of text-mining software tools. Algorithms behind these tools allow users to escape from getting too general or garbage results which make it impossible to identify promising technologies at early stages (early detection, weak signals). Besides, these algorithms provide high-quality results in extraction of complex multiword terms which correspond to technological concepts forming a trend. Methodology and software developed as a result of this study are applicable to various industries with minor adjustments and require no deep expert knowledge from a user.
Extracting knowledge from huge data in a reasonable time is still a challenging problem. Most real data (structured or not) can be mapped to an equivalent binary context, with or without using a scaling method, as for extracting associations between words in a text, or in machine learning systems. In this paper, our objective is to find a minimal coverage of a relation with formal concepts. The problem is known to be NP-complete.1 In this paper, we exploit a particular difunctional relation embedded in any binary relation
, the fringe of
, to find an approximate conceptual coverage of
. We use formal properties of fringes to find better algorithms calculating the minimal rectangular coverage of binary relation. Here, a formal context is considered as a binary relation. By exploiting some background on relational algebra in the present work, we merge some results of Belohlavek and Vichodyl,2 using formal concept analysis with previous results obtained by Kcherif et al.3 using relational algebra. We finally propose decomposition algorithms based on the relational formalization and fringe relations.
Extracting semantic information from multiple natural language sources and combining that information into a single unified resource is an important and fundamental goal for natural language processing. Large scale resources of this kind can be useful for a wide variety of tasks including question answering, word sense disambiguation and knowledge discovery. A single resource representing the information in multiple documents can provide significantly more semantic information than is available from the documents considered independently.
The ASKNet system utilises existing NLP tools and resources, together with spreading activation based techniques, to automatically extract semantic information from a large number of English texts, and combines that information into a large scale semantic network. The initial emphasis of the ASKNet system is on wide-coverage, robustness and speed of construction. In this paper we show how a network consisting of over 1.5 million nodes and 3.5 million edges, more than twice as large as any network currently available, can be created in less than 3 days. Evaluation of large-scale semantic networks is a difficult problem. In order to evaluate ASKNet we have developed a novel evaluation metric based on the notion of a network "core" and employed human evaluators to determine the precision of various components of that core. We have applied this evaluation to networks created from randomly chosen articles used by DUC (Document Understanding Conference). The results are highly promising: almost 80% precision in the semantic core of the networks.
The introduction of electronic medical records (EMRs) enabled the access of unprecedented volumes of clinical data, both in structured and unstructured formats. A significant amount of this clinical data is expressed within the narrative portion of the EMRs, requiring natural language processing techniques to unlock the medical knowledge referred to by physicians. This knowledge, derived from the practice of medical care, complements medical knowledge already encoded in various structured biomedical ontologies. Moreover, the clinical knowledge derived from EMRs also exhibits relational information between medical concepts, derived from the cohesion property of clinical text, which is an attractive attribute that is currently missing from the vast biomedical knowledge bases. In this paper, we describe an automatic method of generating a graph of clinically related medical concepts by considering the belief values associated with those concepts. The belief value is an expression of the clinician's assertion that the concept is qualified as present, absent, suggested, hypothetical, ongoing, etc. Because the method detailed in this paper takes into account the hedging used by physicians when authoring EMRs, the resulting graph encodes qualified medical knowledge wherein each medical concept has an associated assertion (or belief value) and such qualified medical concepts are spanned by relations of different strengths, derived from the clinical contexts in which concepts are used. In this paper, we discuss the construction of a qualified medical knowledge graph (QMKG) and treat it as a BigData problem addressed by using MapReduce for deriving the weighted edges of the graph. To be able to assess the value of the QMKG, we demonstrate its usage for retrieving patient cohorts by enabling query expansion that produces greatly enhanced results against state-of-the-art methods.
Deep learning has a hierarchical network structure to represent multiple features of input data. The adaptive structural learning method of Deep Belief Network (DBN) can reach the high classification capability while searching the optimal network structure during the training. The method can find the optimal number of hidden neurons for given input data in a Restricted Boltzmann Machine (RBM) by neuron generation–annihilation algorithm, and generate a new hidden layer in DBN by the extension of the algorithm. In this paper, the proposed adaptive structural learning of DBN (Adaptive DBN) was applied to the comprehensive medical examination data for cancer prediction. The developed prediction system showed higher classification accuracy for test data (99.5% for the lung cancer and 94.3% for the stomach cancer) than the several learning methods such as traditional RBM, DBN, Non-Linear Support Vector Machine (SVM), and Convolutional Neural Network (CNN). Moreover, the explicit knowledge that makes the inference process of the trained DBN is required in deep learning. The binary patterns of activated neurons for given input in RBM and the hierarchical structure of DBN can represent the relation between input and output signals. These binary patterns were classified by C4.5 for knowledge extraction. Although the extracted knowledge showed slightly lower classification accuracy than the trained DBN network, it was able to improve inference speed by about 1/40. We report that the extracted IF-THEN rules from the trained DBN for medical examination data showed some interesting features related to initial condition of cancer.
A novel approach to trend monitoring and the identification of promising high-tech solutions is presented in the chapter. It is based on the ontology of a technology/market trend, Hype Cycles methodology, and semantic indicators which provide evidence of a maturity level of a technology as well as of emerging user needs (customer pains) in high-tech industries. This approach forms the basis for text mining software tools implemented in Semantic Hub platform. The algorithms behind these tools allow users to escape from getting too general or garbage results which make it impossible to identify promising technologies at early stages (early detection, weak signals). Besides, these algorithms provide high-quality results in the extraction of complex multiword terms which correspond to technological concepts and user pains forming a trend. The methodology and software developed as a result of this study are applicable to various industries with minor adjustments.
In this paper, a method for fuzzy modeling is presented. The framework of the method is the concept of Linguistic Integrity. The use of this framework present several advantages. The most important is transparency, this transparency can be exploited in two directions. The first direction is in data mining where the method can provide a linguistic relation (IF-THEN rules) among the variables. The second direction is to improve the completeness of a model by giving an easy interface to the user such that expert knowledge can be included. The algorithm starts from numerical data (input-output data) and generates a rule base with a limited number of membership functions on each input domain. The rules are created in the environment of fuzzy systems. The algorithm used for rule extraction is named (AFRELI).
This paper discusses the design of classification systems when we have two kinds of information: numerical data and linguistic knowledge. Numerical data are given as a set of labeled samples (i.e., training patterns), which are usually used for designing classification systems in various pattern classification techniques. Linguistic knowledge is a set of fuzzy if-then rules, which is not usually utilized in non-fuzzy pattern classification techniques. In this paper, it is implicitly assumed that either kind of information is not enough for designing classification systems with high classification performance. Thus our task is to design a classification system by simultaneously utilizing these two kinds of information. In this paper, we illustrate two approaches to the design of classification systems from numerical data and linguistic knowledge. One is a fuzzy-rule-based approach where numerical data are used for generating fuzzy if-then rules. The other is a neural-network-based approach where linguistic knowledge as well as numerical data are used for training neural networks. First we discuss the extraction of fuzzy if-then rules directly from numerical data. We also describe the fuzzy rule extraction from neural networks that have already been trained using numerical data. Next we discuss the learning of neural networks from numerical data and linguistic knowledge. In the learning, fuzzy if-then rules and training patterns are handled in a common framework. Finally we examine the performance of these approaches to the design of classification systems from numerical data and linguistic knowledge through computer simulations.
Biological pathways are central to understanding complex diseases such as cancer. The majority of this knowledge is scattered in the vast and rapidly growing research literature. To automate knowledge extraction, machine learning approaches typically require annotated examples, which are expensive and time-consuming to acquire. Recently, there has been increasing interest in leveraging databases for distant supervision in knowledge extraction, but existing applications focus almost exclusively on newswire domains. In this paper, we present the first attempt to formulate the distant supervision problem for pathway extraction and apply a state-of-the-art method to extracting pathway interactions from PubMed abstracts. Experiments show that distant supervision can effectively compensate for the lack of annotation, attaining an accuracy approaching supervised results. From 22 million PubMed abstracts, we extracted 1.5 million pathway interactions at a precision of 25%. More than 10% of interactions are mentioned in the context of one or more cancer types, analysis of which yields interesting insights.