In today’s data-driven era, the accuracy and forward-looking prediction of university financial data are of vital significance for the rational allocation of educational resources and strategic planning. However, the financial data of colleges and universities come from various sources, have complex structure, and are scattered in different database systems, which brings great challenges to data integration and prediction. In order to predict university financial data more accurately, a research on university financial data integration forecast based on deep learning is put forward. Firstly, the Ontology method is used to integrate university financial data. Through data preprocessing, a shared vocabulary is constructed, and the semantic information of college finance is expressed by means of formal ontology, and the ontology attributes of key concepts are extracted. In addition, MC algorithm is used for security processing of integrated data to ensure the security of data in distributed database, and redundant processing of integrated data to generate unified XML format integrated data. Next, with the help of deep belief network in deep learning, feature extraction and dimensionality reduction are carried out on the integrated university financial data. Then, the university financial data prediction model based on these characteristics is constructed, and the university financial data integrated prediction based on deep learning is realized. The experimental results show that the proposed method not only achieves remarkable results in data integration, but also performs well in prediction. This research not only provides a new solution for university financial data forecasting, but also has important significance in theory and practice. In theory, it enriches the theoretical framework of deep learning and data integration. In practice, by improving the accuracy of financial data forecast, the proposed method can help universities to better allocate resources and make strategic planning, and promote the sustainable development of education.
Performance monitoring counters (PMCs) are of great value to monitor the status of processors and their further analysis and modeling. In this paper, we explore a novel problem called PMC integration, i.e., how to combine a group of PMCs which are collected asynchronously together. It is well known that, due to hardware constraints, the number of PMCs that can be measured concurrently is strictly limited. It means we cannot directly acquire all the phenomenon features that are related with the system performance. Clearly, this source raw data shortage is extremely frustrating to PMCs based analysis and modeling tasks, such as PMCs based power estimation. To deal with this problem, we introduce a neighboring interval power values based PMC data integration approach. Based on the activity similarity of easily collected power dissipation values, the proposed approach can automatically combine distinct categories of PMC data together and hence realize the recovery of intact raw PMC data. In addition, the significance and effectiveness of the proposed approach are experimentally verified on a common task, the PMCs based power consumption modeling.
Data-centered approaches to multimedia presentation design and implementation can be developed by extending methodologies and technologies common in text-based applications. A multimedia report is a multimedia presentation built on a set of data returned by one or more queries to multimedia repositories, integrated according to a template with appropriate spatial layout and temporal synchronization, and coherently delivered to a user for browsing. We discuss the problem of defining templates for such multimedia reports with a focus on media coordination and synchronization. Multimedia presentations can be automatically generated according to the template by instantiating it on actual data instances. An XML language describes the spatial layout and the temporal constraints of the media objects. An authoring system and a player have been implemented.
XML Schema matching problem can be formulated as follows: given two XML Schemas, find the best mapping between the elements and attributes of the schemas, and the overall similarity between them. XML Schema matching is an important problem in data integration, schema evolution, and software reuse. This paper describes a matching system that can find accurate matches and scales to large XML Schemas with hundreds of nodes. In our system, XML Schemas are modeled as labeled and unordered trees, and the schema matching problem is turned into a tree matching problem. We proposed Approximate Common Structures in trees, and developed a tree matching algorithm based on this concept. Compared with the traditional tree edit-distance algorithm and other schema matching systems, our algorithm is faster and more suitable for large XML Schema matching.
A personal dataspace management system (PDSMS) is a platform to manage personal data with various data types. Facing huge volume of heterogeneous personal data and complex relationships between them, it is better for users to start with a simplified, easy-to-read schema and then explore in depth only the relevant schema elements during formulating queries. Existing approaches of database schema summarization neglect user interests, which is very important in a personal dataspace. We propose a framework for building a concise resource summary based on user interests automatically in PDSMS. Our method builds the initial summary by partitioning schema graph according to its linkage information and selecting representative elements based on a novel measure on schema element typicality. Then, user interested degree for schema node is introduced to measure user interests and the initial summary is refined according to user interests. Finally, we evaluate the quality of our resource summary through a comprehensive set of experiments, and results indicate that summaries generated by our system are more effective on reducing user efforts required in formulating queries.
The number and size of information services available on the internet has been growing exponentially over the past few years. This growth has created an urgent need for information agents that act as brokers in the sense that they can autonomously search, gather, and integrate information on behalf of a user. To remain useful, such brokers will have to evolve throughout their lifetime to keep up with evolving and ever-changing information services. This paper proposes a framework named XIB (eXtensible Information Brokers) for building and evolving information brokers.
The XIB takes as input a description of required information services and supports the interactive generation of an integrated query interface. It also generates wrappers for each information service dynamically. Once the query interface and wrappers are in place, the user can specify a query and get back a result which integrates data from all wrapped information sources. The XIB depends heavily on XML-related techniques. More specifically, we use DTDs to model the input and output of each service, and XML to represent both input and output values. Based on such representations, the paper investigates service integration in the form DTD integration, and studies query decomposition in the form of XML element decomposition. Within the proposed framework, it is easy to add or remove information services to a broker, thereby facilitating maintenance, evolution and customization of information brokers.
Information integration is one of the most important aspects of a Data Warehouse. When data passes from the sources of the application-oriented operational environment to the Data Warehouse, possible inconsistencies and redundancies should be resolved, so that the warehouse is able to provide an integrated and reconciled view of data of the organization. We describe a novel approach to data integration in Data Warehousing. Our approach is based on a conceptual representation of the Data Warehouse application domain, and follows the so-called local-as-view paradigm: both source and Data Warehouse relations are defined as views over the conceptual model. We propose a technique for declaratively specifying suitable reconciliation correspondences to be used in order to solve conflicts among data in different sources. The main goal of the method is to support the design of mediators that materialize the data in the Data Warehouse relations. Starting from the specification of one such relation as a query over the conceptual model, a rewriting algorithm reformulates the query in terms of both the source relations and the reconciliation correspondences, thus obtaining a correct specification of how to load the data in the materialized view.
Peer Data Management Systems (PDMSs) allow the efficient sharing of data between peers with overlapping sources of information. These sources share data through mappings between peers. In current systems, queries are asked over each peer's local schema and then translated using the mappings between peers. While this allows the data to be accessed uniformly, users lack access to information that is not in their own schemas. In this paper, we propose a light-weight, automatic method to create a mediated schema in a PDMS. Our work benefits PDMSs by allowing access to more data and without unduly stressing the peer's resources or requiring additional resources such as ontologies. We present our system — MePSys, which creates a mediated schema in PDMSs automatically using the existing mappings provided to translate queries. We further discuss how to update the mediated schema in a stable state, i.e. after the system setup period.
On-the-fly data integration, i.e. at query time, happens mostly in tightly coupled, homogeneous environments where the partitioning of the data can be controlled or is known in advance. During the process of data fusion, the information is homogenized and data inconsistencies are hidden from the application. Beyond this, we propose in this paper the Nexus metadata model and a processing approach that support on-the-fly data integration in a loosely coupled federation of autonomous data providers, thereby advancing the status quo in terms of flexibility and expressive power. It is able to represent data and schema inconsistencies like multi-valued attributes and multi-typed objects. In an open environment, this best suites the application needs where the data processing infrastructure is not able to decide which attribute value is correct. The Nexus metadata model provides the foundation for integration schemata that are specific to a given application domain. The corresponding processing model provides four complementary query semantics in order to account for the subtleties of multi-valued and missing attributes. In this paper we show that this query semantics is sound, easy to implement, and it builds upon existing query processing techniques. Thus the Nexus metadata model provides a unique level of flexibility for on-the-fly data integration.
Constraint-based Genome-scale In Silico Models for Systems Biology.
Korean Systems Biology Project.
Systems Biology's Promises and Challenges.
Ontology represents a data source at a higher level of abstraction. Extracting metadata from an autonomous data source and transforming it into source ontology is a tedious and error prone task because the metadata are either incomplete or not available. The essential metadata of a source can be extracted from its data. Our proposed methodology extracts the essential metadata from the data through reverse engineering. In addition, it comprises a set of transformation rules that transform extracted metadata into ontology. The transformation system R2O has been implemented. The evaluation of the proposed transformation is based on two factors, namely (a) correct identification and transformation of metadata and (b) preservation of information capacity. The research has been evaluated through experimental results and mathematical proof. The evaluation shows that the transformation is total and injective, and it preserves information capacity.
Data integration becomes even more necessary given the increasing availability of data from distributed and heterogeneous sources. To address such heterogeneity, crisp ontologies have been employed in order to represent the semantics of integrated data. However, it is interesting to use fuzzy logic concepts in these ontologies for a more expressive representation of vague information relevant to some domains. In this context, this paper presents DISFOQuE system for data integration based on fuzzy ontology, which provides a homogeneous view of data sources and also performs query expansions in order to retrieve more comprehensive answers for the user. We have executed a real experiment in the domain of watershed analysis, which provided a homogeneous view of the watershed data sources and more effective answers to researchers.
Comprehensive and structured annotations for all genes on a microarray chip are essential for the interpretation of its expression data. Currently, most chip gene annotations are one-line free text descriptions that are often partial, outdated and unsuitable for large-scale data analysis. Therefore the interpretation of microarray gene expression clusters is often limited. Although researchers can manually navigate a collection of databases for better annotations, it is only practical for limited number of genes. Existing meta-databases fail to provide comprehensive categorized annotations for hundreds of genes simultaneously. We have developed an automatic system to address this issue.
GeneView system monitors various data sources, extracts gene information from a source whenever it is updated, comprehensively matches genes, and integrates them into a central database by categories, such as pathway, genetic mapping, phenotype, expression profile, domain structure, protein interaction, disease association, and references. The system consists of four major components: (1) relational database; (2) data processing; (3) user curation; (4) data query. We evaluated it by analyzing genes on cDNA and Affymetrix Oligo chips. In both cases, the system provided more accurate and comprehensive information than those provided by the vendors or the chip users, and helped identify new common functions among genes in the same expression clusters.
There have been various attempts to improve the reconstruction of gene regulatory networks from microarray data by the systematic integration of biological prior knowledge. Our approach is based on pioneering work by Imoto et al.11 where the prior knowledge is expressed in terms of energy functions, from which a prior distribution over network structures is obtained in the form of a Gibbs distribution. The hyperparameters of this distribution represent the weights associated with the prior knowledge relative to the data. We have derived and tested a Markov chain Monte Carlo (MCMC) scheme for sampling networks and hyperparameters simultaneously from the posterior distribution, thereby automatically learning how to trade off information from the prior knowledge and the data. We have extended this approach to a Bayesian coupling scheme for learning gene regulatory networks from a combination of related data sets, which were obtained under different experimental conditions and are therefore potentially associated with different active subpathways. The proposed coupling scheme is a compromise between (1) learning networks from the different subsets separately, whereby no information between the different experiments is shared; and (2) learning networks from a monolithic fusion of the individual data sets, which does not provide any mechanism for uncovering differences between the network structures associated with the different experimental conditions. We have assessed the viability of all proposed methods on data related to the Raf signaling pathway, generated both synthetically and in cytometry experiments.
Gene regulatory network inference is one of the central problems in computational biology. We need models that integrate the variety of data available in order to use their complementarity information to overcome the issues of noisy and limited data. BENIN: Biologically Enhanced Network INference is our proposal to integrate data and infer more accurate networks. BENIN is a general framework that jointly considers different types of prior knowledge with expression datasets to improve the network inference. The method states the network inference as a feature selection problem and uses a popular penalized regression method, the Elastic net, combined with bootstrap resampling to solve it. BENIN significantly outperforms the state-of-the-art methods on the simulated data from the DREAM 4 challenge when combining genome-wide location data, knockout gene expression data, and time series expression data.
Although the potentials of data analyses appear promising, companies often fail to develop specific ideas und successfully implement complex data analyses. In this paper, we present BMC Data, a novel, stand-alone tool for the comprehensive assessment of the maturity of a company’s business model in terms of data integration. To obtain a holistic view of all facets of a business model, this analysis reflects on all relevant dimensions of data integration. We tested the functionality of BMC Data in a survey aimed at managing directors and persons at management level (sample size n = 158) within a wide range of company sizes and branches. The evaluation shows that BMC Data is suitable for supporting companies in analysing and reflecting on their business model.
XML is ubiquitously used as an information exchange platform for web-based applications in healthcare, life sciences, and many other domains. Proliferating XML data are now managed through latest native XML database technologies. XML data sources conforming to common XML schemas could be shared and integrated with syntactic interoperability. Semantic interoperability can be achieved through semantic annotations of data models using common data elements linked to concepts from ontologies. In this paper, we present a framework and software system to support the development of semantic interoperable XML based data sources that can be shared through a Grid infrastructure. We also present our work on supporting semantic validated XML data through semantic annotations for XML Schema, semantic validation and semantic authoring of XML data. We demonstrate the use of the system for a biomedical database of medical image annotations and markups.
The development of modern health care and clinical practice increase the need of nutritional and medical data extraction and integration across heterogeneous data sources. It can be useful for researchers and patients if there is a way to extract relevant information and organize it as easily shared and machine processable linked data. In this paper, we describe an automatic approach that extracts and publishes nutritional linked data including nutritional concepts and relationships extracted from nutritional data sources. Moreover, we link the nutritional data with Linked Open Data. In particular, a CRF-based approach is used to mine food, ingredient, disease entities and their relationships from nutritional text. And then, an extended nutritional ontology is used to organize the extracted data. Finally, we assign semantic links between food, ingredient, disease entities and other equivalent entities in DBPedia, Diseasome and LinkedCT.
Flux Balance Analysis (FBA) has been successfully applied to facilitate the understanding of cellular metabolism in model organisms. Standard formulations of FBA can be applied to large systems, but the accuracy of predictions may vary significantly depending on environmental conditions, genetic perturbations, or complex unknown regulatory constraints. Here we present an FBA-based approach to infer the biomass compositions that best describe multiple physiological states of a cell. Specifically, we seek to use experimental data (such as flux measurements, or mRNA expression levels) to infer best matching stoichiometrically balanced fluxes and metabolite sinks. Our algorithm is designed to provide predictions based on the comparative analysis of two metabolic states (e.g. wild-type and knockout, or two different time points), so as to be independent from possible arbitrary scaling factors. We test our algorithm using experimental data for metabolic fluxes in wild type and gene deletion strains of E. coli. In addition to demonstrating the capacity of our approach to correctly identify known exchange fluxes and biomass compositions, we analyze E. coli central carbon metabolism to show the changes of metabolic objectives and potential compensation for reducing power due to single enzyme gene deletion in pentose phosphate pathway.
We demonstrate an approach to automatically generating kinetic models of metabolic networks. In a first step, the metabolic network is characterised by its stoichiometric structure. Then to each reaction a kinetic equation is associated describing the metabolic flux. For the kinetics we use a formula that is universally applicable to reactions with arbitrary numbers of substrates and products. Last, the kinetics of the reactions are assigned parameters. The resulting model in SBML format can be fed into standard simulation tools. The approach is applied to the sulphur-glutathione-pathway in Saccharomyces cerevisiae.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.