Please login to be able to save your searches and receive alerts for new content matching your search criteria.
The relationship between the absence of redundancy in relational databases and fourth normal form (4NF) is investigated. A relation scheme is defined to be redundant if there exists a legal relation defined over it which has at least two tuples that are identical on the attributes in a functional dependency (FD) or multivalued dependency (MVD) constraint. Depending on whether the dependencies in a set of constraints or the dependencies in the closure of the set is used, two different types of redundancy are defined. It is shown that the two types of redundancy are equivalent and their absence in a relation scheme is equivalent to the 4NF condition.
For a company with many databases in different data models, it is necessary to consolidate them into one data model interchangeable and present data in one data model concurrently to users. The benefit is to let user stick to his/her own data model to access database in another data model. This paper presents a semantic metadata to preserve database constraints for data materialization to support user's view of database on an ad hoc base. The semantic metadata can store the captured semantics of a relational or an object-oriented database into classes and stored procedures triggered by events. The stored constraints and data can be materialized into a target database upon user request. The user is allowed to perform the data materialization many times alternatively. The process can provide a relational as well as an object oriented view to the users simultaneously. This concurrent data materialization function can be applied to data warehouse to consolidate heterogeneous database into a fact table in a data model of user's choice. Furthermore, a user can obtain either a relational view or an object-oriented view of the same dataset of an object-relational database interchangeably.
In this paper we present a multi-key index model that enables us to search a record with more than one attribute values in distributed database systems. Indices provide fast and efficient access of data and so become a major aspect in centralized database systems. Most of the centralized database systems use B+ tree or other types of index structures such as bit vector, graph structure, grid file etc. But in distributed database systems no index model is found in the literature. Therefore efficient access is a major problem in distributed databases. Our proposed index model avoids the query-flooding problem of existing system and thus optimizes network bandwidth.
For a company with many databases in different data models, it is necessary to consolidate them into one interchangeable data model and present data in more than one data model concurrently to different users or individual users who need to access the data in more than one data model. The benefit is to let the user stick to his/her own data model to access database in another data model. This paper presents a semantic metadata to preserve database constraints for data materialization to support the user's view of database on an ad hoc basis. The semantic metadata can store the captured semantics of a relational or an XML-enabled database into classes. The stored constraints and data can be materialized into a target database upon user request. The user is allowed to perform data materialization many times alternatively. The process can provide a relational as well as an XML view to the users simultaneously. This concurrent data materialization function can be applied into data warehouse to consolidate heterogeneous database into a fact table in a data model of user's choice. Furthermore, a user can obtain either a relational view or an XML view of the same dataset of an XML-enabled database interchangeably.
RDF is widely used as an ontology language for representing the metadata in Semantic Web, knowledge management system and E-commerce. Since ontologies model the knowledge in a particular domain, they may change over time. Furthermore, ontologies are usually developed and controlled in a distributed and collaborative way. Thus, it is very important to be able to manage multiple versions for RDF data. Earlier studies on RDF versions have focused on providing the accesses to different versions (i.e. snapshots) and computing the differences between those two versions. However, the existing approaches suffer from the space overhead for large scale data, since all snapshots should be redundantly kept in a repository. Moreover, it is very time consuming to compute the delta between two specific versions, which is very common in RDF applications. In this paper, we propose a framework for RDF version management in relational databases. It stores the original version and the deltas between two consecutive versions, thereby reducing the space requirement considerably. The other benefit of our approach is appropriate for change queries. On the flip side, in order to answer a query on a specific logical version, version should be constructed on the fly by applying the deltas between the original version and the logical version. This can slow down query performance. In order to overcome this, we propose a compression technique for deltas, called Aggregated Delta, to create a logical version directly rather than executing the sequence of deltas. An experimental study with real life RDF data sets shows our framework maintains multiple versions efficiently.
In this paper, we define the concept of generalized rule for making classical deduction with imprecise data, stored both data and rules in a fuzzy relational database represented in the GEFRED model. We propose a way of measuring the imprecision related to the calculation of a fact based on the matching degree of the facts in the database and the facts calculated while expanding the rules. In order to achieve this, classical algorithms for deduction are not appropriated and we propose the modifications that have to be applied on a classical tuple-oriented algorithm in order to design a new algorithm for deducing from imprecise data with generalized rules.
In this paper we discuss an important integrity constraint called multivalued dependency (mvd), which occurs as a result of the first normal form, in the framework of a newly proposed model called fuzzy multivalued relational data model. The fuzzy multivalued relational data model proposed in this paper accommodates a wider class of ambiguities by representing the domain of attributes as a “set of fuzzy subsets”. We show that our model is able to represent multiple types of impreciseness occurring in the real world. To compute the equality of two fuzzy sets/values (which occur as tuple-values), we use the concept of fuzzy functions. So the main objective of this paper is to extend the mvds in context of fuzzy multivalued relational model so that a wider class of impreciseness can be captured. Since the mvds may not exist in isolation, a complete axiomatization for a set of fuzzy functional dependencies (ffds) and mvds in fuzzy multivalued relational schema is provided and the role of fmvds in obtaining the lossless join decomposition is discussed. We also provide a set of sound Inference Rules for the fmvds and derive the conditions for these Inference Rules to be complete. We also derive the conditions for obtaining the lossless join decomposition of a fuzzy multivalued relational schema in the presence of the fmvds. Finally we extend the ABU's Algorithm to find the lossless join decomposition in context of fuzzy multivalued relational databases. We apply all of the concepts of fmvds developed by us to a real world application of “Technical Institute” and demonstrate that how the concepts fit well to capture the multiple types of impreciseness.
In this paper, we describe the architecture of a protein interaction database and tools for manipulating Drosophila protein interaction data. The proposed system not only maintains interaction data collected by an experiment, but also associates the interaction data with valuable data from various genomic databases. The system inherits a layered-modular architecture by introducing a wrapper-mediator approach in order to solve the syntactic and semantic heterogeneity among multiple data sources. The component modules for wrapping and integrating the relevant data, querying the database, and visualizing the interaction data among proteins are discussed. The system wrapped the relevant data for 14,000 Drosophila proteins from 5 publicly accessible sources. A web-based query interface is developed to browse the database and a query result can be viewed as in a protein interaction map depicting functional pathways, complexes or networks. Protein interaction maps aid in understanding or predicting potential functions for uncharacterized proteins and in describing their functional networks in a biological context. We show that the proposed approach supports data association and data interoperability in a protein interaction database.
In relational databases and their applications, there are opportunities for evaluating a stream of KNN queries submitted one by one at different times. For this issue, we propose a new method with learning-based techniques, region clustering methods and caching mechanisms. This method uses a knowledge base to store related information of some past KNN queries, groups the search regions of the past queries into larger regions, and retrieves the tuples from the larger regions. To answer a newly submitted query, our strategy tries to obtain a majority or all of the results from the previously retrieved tuples cached in main memory. Thus, this method seeks to minimize the response time by reducing the search region or avoiding the accesses to the underlying database. Meanwhile, our method remains effective for high-dimensional data. Extensive experiments are carried out to measure the performance of this new strategy and the results indicate that it is significantly better than the state-of-the-art naïve methods of evaluating a stream of KNN queries for both low-dimensional (2, 3 and 4) and high-dimensional (25, 50 and 104) data.
Data provenance is a kind of metadata that describes the origin and derivation history of data. It provides the information about various direct and indirect sources of data and different transformations applied on it. Provenance information are beneficial in determining the quality, truthfulness, and authenticity of data. It also explains how, when, why, and by whom this data are created. In a relational database, fine-grained provenance captured at different stages (i.e., multi-layer provenance) is more significant and explanatory as it provides various remarkable information such as immediate and intermediate sources and origin of data. In this paper, we propose a novel multi-layer data provenance framework for Zero-Information Loss Relational Database (ZILRDB). The proposed framework is implemented on top of the relational database using the object relational database concepts to maintain all insert, delete, and update operations efficiently. It has the capability to capture multi-layer provenance for different query sets including historical queries. We also propose Provenance Relational Algebra (PRA) as an extension of traditional relational algebra to capture the provenance for ASPJU (Aggregate, Select, Project, Join, Union) queries in relational database. The framework provides a detailed provenance analysis through multi-depth provenance querying. We store the provenance data in both relational and graph database, and further evaluate the performance of the framework in terms of provenance storage overhead and average execution time for provenance querying. We observe that the graph database offers significant performance gains over relational database for executing multi-depth queries on provenance. We present two use case studies to explain the usefulness of proposed framework in various data-driven systems to increase the understandability of system’s behavior and functionalities.
The development of relational databases has significantly improved the performance of storage, search, and retrieval functions and has made it possible for applications that perform real-time data acquisition and analysis to interact with these types of databases. The purpose of this research was to develop a user interface for interaction between a data acquisition and analysis application and a relational database using the Oracle9i system. The overall system was designed to have an indexing capability that threads into the data acquisition and analysis programs. Tables were designed and relations within the database for indexing the files and information contained within the files were established. The system provides retrieval capabilities over a broad range of media, including analog, event, and video data types. The system's ability to interact with a data capturing program at the time of the experiment to create both multimedia files as well as the meta-data entries in the relational database avoids manual entries in the database and ensures data integrity and completeness for further interaction with the data by analysis applications.
Extensible Markup Language (XML) has become a common language for data interchange and data representation in the Web. The evolution of the big data environment and the large volume of data which is being represented by XML on the Web increase the challenges in effectively managing such data in terms of storing and querying. Numerous solutions have been introduced to store and query XML data, including the file systems, Object-Oriented Database (OODB), Native XML Database (NXD), and Relational Database (RDB). Previous research attempts indicate that RDB is the most powerful technology for managing XML data to date. Because of the structure variations of XML and RDB, the need to map XML data to an RDB scheme is increased. This growth has prompted numerous researchers and database vendors to propose different approaches to map XML documents to an RDB, translating different types of XPath queries to SQL queries and returning the results to an XML format. This paper aims to comprehensively review most cited and latest mapping approaches and database vendors that use RDB solution to store and query XML documents, in a narrative manner. The advantages and the drawbacks of each approach is discussed, particularly in terms of storing and querying. The paper also provides some insight into managing XML documents using RDB solution in terms of storing and querying and contributes to the XML community.
Short-insert shotgun sequencing approaches have been applied in recent years to environmental genomic libraries. In the case of complex multispecies microbial communities, there can be many sequence reads that are not incorporated into assemblies, and thus need to be annotated and accessible as single reads. Most existing annotation systems and genome databases accommodate assembled genomes containing contiguous gene-encoding sequences. Thus, a solution is required that can work effectively with environmental genomic annotation information to facilitate data analysis. The Environmental Genome Informational Utility System (EnGenIUS) is a comprehensive environmental genome (metagenome) research toolset that was specifically designed to accommodate the needs of large (> 250 K sequence reads) environmental genome sequencing efforts. The core EnGenIUS modules consist of a set of UNIX scripts and PHP programs used for data preprocessing, an annotation pipeline with accompanying analysis tools, two entity relational databases, and a graphical user interface. The annotation pipeline has a modular structure and can be customized to best fit input data set properties. The integrated entity relational databases store raw data and annotation analysis results. Access to the underlying databases and services is facilitated through a web-based graphical user interface. Users have the ability to browse, upload, download, and analyze preprocessed data, based on diverse search criteria. The EnGenIUS toolset was successfully tested using the Alvinella pompejana epibiont environmental genome data set, which comprises more than 300 K sequence reads. A fully browsable EnGenIUS portal is available at (access code: "guest"). The scope of this paper covers the implementation details and technical aspects of the EnGenIUS toolset.
Integrating existing relational databases with ontology-based systems is among the important research problems for the Semantic Web. We have designed a comprehensive framework called OntoGrate which combines a highly automatic mapping system, a logic inference engine, and several syntax wrappers that inter-operate with consistent semantics to answer ontology-based queries using the data from heterogeneous databases. There are several major contributions of our OntoGrate research: (i) we designed an ontology-based framework that provides a unified semantics for mapping discovery and query translation by transforming database schemas to Semantic Web ontologies; (ii) we developed a highly automatic ontology mapping system which leverages object reconciliation and multi-relational data mining techniques; (iii) we developed an inference-based query translation algorithm and several syntax wrappers which can translate queries and answers between relational databases and the Semantic Web. The testing results of our implemented OntoGrate system in different domains show that the large amount of data in relational databases can be directly utilized for answering Semantic Web queries rather than first converting all relational data into RDF or OWL.
The analysis of relational and NoSQL databases leads to the conclusion that these data processing systems are to some extent complementary. In the current Big Data applications, especially where extensive analyses (so-called Big Analytics) are needed, it turns out that it is nontrivial to design an infrastructure involving data and software of both types. Unfortunately, the complementarity negatively influences integration possibilities of these data stores both at the data model and data processing levels. In terms of performance, it may be beneficial to use a polyglot persistence, a multimodel approach or multilevel modeling, or even to transform the SQL database schema into NoSQL and to perform data migration between the relational and NoSQL databases. Another possibility is to integrate a NoSQL database and relational database with the help of a third data model. The aim of the paper is to show these possibilities and present some new methods of designing such integrated database architectures.
Along with the rapidly growing scale of relational database (RDB), how to construct domain-related ontologies from various databases effectively and efficiently has been a bottleneck of the ontology-based integration. The traditional methods for constructing ontology from RDB are mainly based on the manual mapping and transformation, which not only requires a lot of human experience but also easily leads to the semantic loss during the transformation. Ontology learning from RDB is a new paradigm to (semi-)automatically construct ontologies from RDB by borrowing the techniques of machine learning, it provides potential opportunities for integrating heterogeneous data from various data sources efficiently. This paper surveys the recent methods and tools of the ontology learning from RDB, and highlights the potential opportunities and challenges of using ontology learning in semantic information integration. Initially, the previous surveys on the topic of the ontology-based integration and ontology learning were summarized, and then the limitations of previous surveys were identified and analyzed. Furthermore, the methods and techniques of ontology learning from RDB were investigated by classifying into three categories: reverse engineering, mapping, and machine learning. Accordingly, the opportunities and possibility of using ontology learning from RDB in semantic information integration were discussed based on the mapping results between the bottlenecks of ontology-based integration and the features of ontology learning.a
This paper is an extension of our prior work [C. Ma and B. Molnár, Use of ontology learning in information system integration: A literature survey, in Intelligent Information and Database Systems, eds. P. Sitek, M. Pietranik, M. Krótkiewicz and C. Srinilta, Communications in Computer and Information Science, Vol. 1178 (Springer, Singapore, 2020), pp. 342–353]
Changing a database schema is a fact of life in information systems, as a response to changes inside the enterprise (e.g., new users’ requirements, correction of errors in the current database schema) or outside it (e.g., new regulations, new partners’ requirements). In the database research field, a well-known technique has been proposed for managing schema changes, called schema evolution. It allows the database to survive schema changes by adapting existing data to conform to the new schema. A lot of research efforts addressed the topic of schema evolution, in both conventional (i.e., relational) and advanced (e.g., XML, stream, NoSQL) databases, providing a plethora of heterogeneous approaches and solutions making up a quite large literature. Since there is no research work that extensively deals with different proposals and compares them, the purpose of this paper is to fill this gap by reviewing the available schema evolution literature. For that, first we collected and summarized the contributions of research papers dealing with database schema evolution. Then we organized their presentation in a chronological order, also giving a historical perspective on the topic development. Finally, we defined a list of six comparison criteria (database model, implementation, schema change semantics, schema change propagation, integrity constraints, and software evolution) that have helped us to categorize and compare the different database schema evolution proposals. In sum, our paper (i) provides an overview of the state-of-the-art research approaches on database schema evolution, with tables that compare such approaches based on some proposed criteria, (ii) studies the support of schema evolution in commercial DataBase Management Systems (DBMSs), and (iii) points out some possible future research directions in this area.
Changes involving the database schema are unavoidable events that sometimes occur in data-centric systems, due to many reasons like evolution of user requirements, corrective maintenance, improvement of the current schema with the addition of new integrity constraints, implementation of new business rules, or compliance to new regulations. In order to manage schema changes, researchers have proposed a technique named “schema versioning”, in which all versions of the database schema are kept along with their underlying data, preventing loss of information and obsolescence of legacy applications. This technique has been widely studied in both conventional and emerging databases. Despite lot of interesting research efforts that have dealt with database schema versioning and the importance of this technique for database maintenance, current Database Management Systems (DBMSs) do not provide sufficient support for its implementation. This paper (i) gives an overview of state-of-the-art research approaches to schema versioning in databases, based on the study of more than 270 papers and providing a historical perspective on the development of the topic, (ii) illustrates the support of schema versioning in commercial DBMSs, and (iii) points out some possible future research directions in this area.
It is no secret that the e-commerce industry is changing every year. Earlier, before the invention of the internet, the “brick-and-mortar” business model was used where customers needed to visit a physical outlet to purchase goods, but this is not the scenario nowadays. People now have the luxury of buying items from the convenience of their own homes online, and they are now taking advantage of it. In this 21st century with the ever-increasing number of people participating in electronic commerce, it is necessary to have advanced information technology to handle this e-commerce ecosystem. In 2018 alone, mobile conversions have increased by 55% and are expected to reach $175.4 billion in USD sales. Any business, store, or person who actively sells products online is considered part of this e-commerce system. To serve the customer with a better experience and streamline their various processes, e-commerce retailers often implement a database to strategically capture vital information. A database is a collection of data that stores organized information.
The retail business completely depends on databases 24 × 7 for order processing, transactions, inventory, order shipping, etc. With the existence of a database management system, each organization can attempt to be a lot competitive to skyrocket the decision-making process, increasing organizational performance in achieving targeted goals.
In an e-commerce application, the main purpose of a database is to store information for retrieving the product details, customer information, track transactions, and further, maintain the inventory. One of the biggest benefits of using a database for e-commerce is structuring vast amounts of shop data. When the data is organized in a proper format, it can be accessed more efficiently by e-commerce applications.
Database plays a very critical and important role in the e-commerce industry, in today’s scenario the reason behind the success of an e-commerce firm is how much it has optimized its database. Because today’s consumers rely heavily on technology, e-commerce firms must use it to their advantage.
To handle incomplete data in Database and Information systems is a nontrivial problem. There is no fully satisfactory design theory incorporating null values in Relational Database systems. In this paper we present our multivalued logical proposal to deal with null values in Relational Databases. This method is different from classifying nulls into multi categories, such as Codd’s four valued logic which classified nulls into applicable and inapplicable nulls, respectively. We believe that four valued logic can not cover all possible nulls, and neither can n + 1 valued logic. Our multivalued logical approach is based on traditional three valued logic, which gives each null value a possible truth degree. The result of a query will depend on truth degrees of the nulls involved, giving a ‘probabilistic’ response to the query. To deal with null values of various truth degrees we introduce an augmented set of relational and logical operations. An arithmetic operation approach for nulls is introduced, which we believe can be extended for traditional relational algebra. To simplify the handling of nulls by a RDBMS, a measurement of null truth level is proposed. According to the truth level of a null, RDBMS can decide whether to use it in calculation or ignore it.