Please login to be able to save your searches and receive alerts for new content matching your search criteria.
The rdfs:seeAlso predicate plays an important role in linking web resources in semantic web. Based on the W3C definition, it shows that the object resource provide additional information about the subject resource. Since providing additional information can take various forms, the definition is generic. In the other words, the rdfs:seeAlso link can present different meanings to the users and it can represents different kind of patterns and relationships between web resources. These patterns are unknown and have to be specified to help organizations, and individuals to interlink, and publish their datasets on Web of Data using the rdfs:seeAlso link. In this paper, we investigate to the traditional usages of seealso and then present a methodology to specify the patterns of rdfs:seeAlso usages in Semantic Web. The results of our investigation show that the discovered patterns constitute a significant portion of rdfs:seeAlso usages in Web of Data.
Thanks to Open Data initiatives the amount of data available on the Web is rapidly increasing. Unfortunately, most of these initiatives only publish raw tabular data, which makes its analysis and reuse very difficult. Linked Data principles allow for a more sophisticated approach by making explicit both the structure and semantics of the data. However, from the user experience viewpoint, published datasets continue to be monolithic files which are completely opaque or difficult to explore by making complex semantic queries. Our objective is to facilitate the user to grasp what kind of entities are in the dataset, how they are interrelated, which are their main properties and values, etc. Rhizomer is a data publishing tool whose interface provides a set of components borrowed from Information Architecture (IA) that facilitate getting an insight of the dataset at hand. Rhizomer automatically generates navigation menus and facets based on the kinds of things in the dataset and how they are described through metadata properties and values. This tool is currently being evaluated with end users that discover a whole new perspective of the Web of Data.
Named Entity Extraction (NEE) is the process of identifying entities in texts and, very commonly, linking them to related (Web) resources. This task is useful in several applications, e.g. for question answering, annotating documents, post-processing of search results, etc. However, existing NEE tools lack an open or easy configuration although this is very important for building domain-specific applications. For example, supporting a new category of entities, or specifying how to link the detected entities with online resources, is either impossible or very laborious. In this paper, we show how we can exploit semantic information (Linked Data) at real-time for configuring (handily) a NEE system and we propose a generic model for configuring such services. To explicitly define the semantics of the proposed model, we introduce an RDF/S vocabulary, called “Open NEE Configuration Model”, which allows a NEE service to describe (and publish as Linked Data) its entity mining capabilities, but also to be dynamically configured. To allow relating the output of a NEE process with an applied configuration, we propose an extension of the Open Annotation Data Model which also enables an application to run advanced queries over the annotated data. As a proof of concept, we present X-Link, a fully-configurable NEE framework that realizes this approach. Contrary to the existing tools, X-Link allows the user to easily define the categories of entities that are interesting for the application at hand by exploiting one or more semantic Knowledge Bases. The user is also able to update a category and specify how to semantically link and enrich the identified entities. This enhanced configurability allows X-Link to be easily configured for different contexts for building domain-specific applications. To test the approach, we conducted a task-based evaluation with users that demonstrates its usability, and a case study that demonstrates its feasibility.
The Web of Data (WoD) is an Internet-based network of data resources and their relations. It has recently taken flight and combines over a hundred interlinked data sources with more than 15 billion edges. A consequence of this recent success is that a paradigm shift has taken place: up to now the Web of Data could be studied, searched and maintained like a classical database; nowadays it has turned into a Complex System and needs to be studied as such. In this paper, we introduce the Web of Data as a challenging object of study and provide initial results on two network scales: the pure data-layer, and the global connection between groups data items. In this analysis, we show that the "official" abstract representation of the WoD does not fit the real distribution we derive from the lower scale. As interesting as these results are, bigger challenges for analysis await in the form of the highly dynamic character of the WoD, and the typed, and implicit, character of the edges which is, to the best of our knowledge, hitherto unstudied.
The vision of creating a Linked Data Web brings together the challenge of allowing queries across highly heterogeneous and distributed datasets. In order to query Linked Data on the Web today, end users need to be aware of which datasets potentially contain the data and also which data model describes these datasets. The process of allowing users to expressively query relationships in RDF while abstracting them from the underlying data model represents a fundamental problem for Web-scale Linked Data consumption. This article introduces a distributional structured semantic space which enables data model independent natural language queries over RDF data. The center of the approach relies on the use of a distributional semantic model to address the level of semantic interpretation demanded to build the data model independent approach. The article analyzes the geometric aspects of the proposed space, providing its description as a distributional structured vector space, which is built upon the Generalized Vector Space Model (GVSM). The final semantic space proved to be flexible and precise under real-world query conditions achieving mean reciprocal rank = 0.516, avg. precision = 0.482 and avg. recall = 0.491.
Current multimedia applications in Web 2.0 have generated large repositories for multimedia resources and annotations, so there is an urgent requirement to interlink annotations of these resources across different repositories to achieve better indexing and searching. To solve this problem, many researchers have been trying to apply semantic Web technologies to media fragments and annotations. Linked data has brought forward a promising way to expose, index and search media fragments and annotations which used to be isolated in different applications. This paper discusses in depth three key research problems when applying linked data principles in multimedia annotations: choosing URIs for media fragments, dereferencing media fragments and ontology alignment. An architecture is designed based on the possible solutions of the research problems. The key idea of the architecture is that it should act like an extra layer built on top of existing applications when publishing linked data. A demo is built as an implementation of the architecture to show that media fragments can be published and linked to various datasets in the linked data cloud. In the future, some algorithms should be designed to make full use of the interlinked media fragments and annotations for indexing and searching.
Today's search interfaces typically offer keyword searches and facets for the retrieval of cultural heritage multimedia objects. Facets, however, are usually based on a static set of metadata fields. This set is often called an indexing profile. Graph-based repositories based on predicates about resources allow for more precise semantics. They offer stronger support for retrieval, and they can be adopted to almost any metadata format. Technically, those predicates may be serialized as RDF triples, but handling a huge amount of objects with numerous predicates puts an unpredictable load on the query engine. In this paper, we present an approach on analysing transition paths in the RDF triples at ingest time and using the results to create facets in the search index.
Statistical data is one of the most important sources of information, relevant for large numbers of stakeholders in the governmental, scientific and business domains alike. In this article, we overview how statistical data can be managed on the Web. With OLAP2DataCube and CSV2DataCube we present two complementary approaches on how to extract and publish statistical data. We also discuss the linking, repair and the visualization of statistical data. As a comprehensive use case, we report on the extraction and publishing on the Web of statistical data describing 10 years of life in Brazil.
Nowadays, there are a huge number of digital television platforms and channels, so it is not easy for the viewer to decide what they want to watch. Some television providers offer information about the programs they broadcast, but this information is usually scarce and there is no possibility to perform advanced operations like recommendation ones. For this reason, viewers could benefit from a system that integrates all the available information about contents, and applies semantics methodologies in order to provide a better television watching experience.
The main objective of this research is the design of a television content management system, called OntoTV, which retrieves television content information from various existing sources and represents all these data in the best way possible by using knowledge engineering and ontologies. These semantic computing techniques make it possible to offer the viewers more useful operations on the stored data than traditional systems do, and with a high degree of personalization. Additionally, OntoTV accomplishes all of this regardless of the TV platform installed and the client device used. The viewers' satisfaction when using this system has been also studied to prove its functionality.
The amount of data within the Linking Open Data (LOD) Cloud is steadily increasing and resembles a rich source of information. Since Context-aware Services (CAS) are based on the correlation of heterogeneous data sources for deriving the contextual situation of a target, it makes sense to leverage that enormous amount of data already present in the LOD Cloud to enhance the quality of these services. Within this work, the applicability of the LOD Cloud as a context provider for enriching CAS is investigated. For this purpose, a deep analysis according to the discoverability and availability of datasets is performed. Furthermore, in order to ease the process of finding a dataset that matches the information needs of a CAS developer, techniques for retrieving contents of LOD datasets are discussed and different approaches to condense the dataset to its most important concepts are shown. Finally, a Context Data Lookup Service is introduced that enables context data discovery within the LOD Cloud and its applicability is highlighted based on an example.
The integration of the classical Web (of documents) with the emerging Web of Data is a challenging vision. In this paper we focus on an integration approach during searching which aims at enriching the responses of non-semantic search systems with semantic information, i.e. Linked Open Data (LOD), and exploiting the outcome for offering advanced exploratory search services which provide an overview of the search space and allow the users to explore the related LOD. We use named entities identified in the search results for automatically connecting search hits with LOD and we consider a scenario where this entity-based integration is performed at query time with no human effort and no a-priori indexing which is beneficial in terms of configurability and freshness. However, the number of identified entities can be high and the same is true for the semantic information about these entities that can be fetched from the available LOD. To this end, in this paper we propose a Link Analysis-based method which is used for ranking (and thus selecting to show) the more important semantic information related to the search results. We report the results of a survey regarding the marine domain with promising results, and comparative results that illustrate the effectiveness of the proposed (PageRank-based) ranking scheme. Finally, we report experimental results regarding efficiency showing that the proposed functionality can be offered even at query time.
The development of modern health care and clinical practice increase the need of nutritional and medical data extraction and integration across heterogeneous data sources. It can be useful for researchers and patients if there is a way to extract relevant information and organize it as easily shared and machine processable linked data. In this paper, we describe an automatic approach that extracts and publishes nutritional linked data including nutritional concepts and relationships extracted from nutritional data sources. Moreover, we link the nutritional data with Linked Open Data. In particular, a CRF-based approach is used to mine food, ingredient, disease entities and their relationships from nutritional text. And then, an extended nutritional ontology is used to organize the extracted data. Finally, we assign semantic links between food, ingredient, disease entities and other equivalent entities in DBPedia, Diseasome and LinkedCT.
For research institutes, data libraries, and data archives, validating RDF data according to predefined constraints is a much sought-after feature, particularly as this is taken for granted in the XML world. Based on our work in two international working groups on RDF validation and jointly identified requirements to formulate constraints and validate RDF data, we have published 81 types of constraints that are required by various stakeholders for data applications.
In this paper, we evaluate the usability of identified constraint types for assessing RDF data quality by (1) collecting and classifying 115 constraints on vocabularies commonly used in the social, behavioral, and economic sciences, either from the vocabularies themselves or from domain experts, and (2) validating 15,694 data sets (4.26 billion triples) of research data against these constraints. We classify each constraint according to (1) the severity of occurring violations and (2) based on which types of constraint languages are able to express its constraint type. Based on the large-scale evaluation, we formulate several findings to direct the further development of constraint languages.
Biosimulation models are used to understand the multiple or different causative factors that cause impairment in human organs. Finite Element Method (FEM) provide a mathematical framework to simulate dynamic biological systems, with applications ranging from human ear, cardiovascular, to neurovascular research. Finite Element (FE) Biosimulation experiments produce huge amounts of numerical data. Visualising and analysing this huge numerical biosimulation data is a strenuous task. In this paper, we present a Linked Data Visualiser–called SIFEM Visualiser–to help domain-experts (experts in the field of ear mechanics) and clinical practitioners (otorhinolaryngologists) to Visualise, analyse and compare biosimulation results from heterogeneous, complex, and high volume numerical data. The SIFEM visualiser builds on conceptualising different aspects of biosimulations. In addition to the visualiser, we also propose how biosimulation numerical data can be conceptualised, such that it sustains the visualisation of large numerical data. The SIFEM Visualiser aims to help domain scientists and clinical practitioners exploring and analysing Finite Element (FE) numerical data and simulation results obtained from different aspects of inner ear (Cochlear) model — such as biological, geometrical, mathematical, and physical models. We validate the SIFEM Visualiser in both dimensions of qualitative and quantitative evaluation.
This paper proposes a semantic computing application for a news delivering service collects and recommends news articles which user feels interested in based on semantic relations between terms in the articles. We define interested articles as the ones that users have curiosity and serendipity. The semantic relations between terms are represented by graphs of Linked Data. We first create News Articles Linked Data, which are news articles for recommendation to users and User’s preferences Linked Data created from the users’ preferred articles. Then, common subgraphs between two graphs are searched for the recommendation of news articles. The experiment showed that the curiosity score is 3.30 (min:0, max:4), and the serendipity score is 2.93 in our approach, but a baseline method indicated the curiosity score: 3.03 and the serendipity score: 2.79. Thus, we confirmed that our approach is more effective than the baseline method. In the future, we will deploy our semantic technology to practical use for automatically delivering information selected from a vast amount of news sources.
The growth of the Web represents also one of its major challenges, as users face the problem of selecting the pages that are most relevant to their taskfrom a vast amount of information. Search engines and microdata are an example of a means towards the end of helping in better targeting search, but there is no universal perfect solution for all information needs. In addition, the last years have witnessed the emergence of the Web of Linked Data, fostered by the increasing adoption of openness as a paradigm for sharing information for the benefit of the commons. Microdata, linked data and other technologies are no other thing than different ways of using metadata to enhance information seeking, targeting and integration. Metadata thus is nowadays the fabric of the Web. Understanding the different forms and arrangements of metadata is in consequence a required skill for researchers and practitioners that aim at understanding and getting value from the Web.
Identity management in the library community focuses on creation, conversion, sharing and maintenance of authority data – the massive databases of the standardized machine-readable records that describe persons, institutions, places, events, and works and relations between them. This chapter presents some results of the content analysis study that explores the state of implementation of Resource Description and Access (RDA) standard in authority records for personal, corporate, geographic names and uniform (preferred) titles. The study that focuses on a snapshot of identity data from 2016 reveals that a large proportion of authority records are created according to or revised to be brought in conformance with RDA requirements in terms of including one or more new data elements that increase functionality of these records and enhance access to resources. Results of the quantitative analysis of a large sample of RDA-based name authority records and title authority records available through the United States Library of Congress Database suggest those RDA-specific data elements – both fields and subfields – that are most widely applied in the authority data and identify the ones that need more attention by record creators, in particular the Linked Data enabling data elements. Directions for future research are suggested.
This paper focuses on problems with exposing the digital libraries supporting OAI-PMH to the Semantic Web services. The most wanted thing for OAI-PMH metadata is being querable via SPARQL. This means the OAI-PMH formats must be converted and served as RDF. Another important thing is supporting the fourth rule of linked data, which is about interlinking relevant resources. The interlinking can be done by mapping resources across different SPARQL endpoints. But problem arises when the fields to be mapped are not a perfect match. The solution may be some similarity metric with an established threshold.