Please login to be able to save your searches and receive alerts for new content matching your search criteria.
We develop a new approach to the design of neural networks, which utilizes a collaborative framework of knowledge-driven experience. In contrast to the "standard" way of developing neural networks, which explicitly exploits experimental data, this approach incorporates a mechanism of knowledge-driven experience. The essence of the proposed scheme of learning is to take advantage of the parameters (connections) of neural networks built in the past for the same phenomenon (which might also exhibit some variability over time or space) for which are interested to construct the network on a basis of currently available data. We establish a conceptual and algorithmic framework to reconcile these two essential sources of information (data and knowledge) in the process of the development of the network. To make a presentation more focused and come up with a detailed quantification of the resulting architecture, we concentrate on the experience-based design of radial basis function neural networks (RBFNNs). We introduce several performance indexes to quantify an effect of utilization of the knowledge residing within the connections of the networks and establish an optimal level of their use. Experimental results are presented for low-dimensional synthetic data and selected datasets available at the Machine Learning Repository.
Recent studies have shown that it is feasible to record simultaneously intracerebral EEG (icEEG) and functional magnetic resonance imaging (fMRI) in patients with epilepsy. While it has mainly been used to explore the hemodynamic changes associated with epileptic spikes, this approach could also provide new insight into human cognition. However, the first step is to ensure that cognitive EEG components, that have lower amplitudes than epileptic spikes, can be appropriately detected under fMRI. We compared the high frequency activities (HFA, 50–150Hz) elicited by a reading task in icEEG-only and subsequent icEEG-fMRI in the same patients (n=3), implanted with depth electrodes. Comparable responses were obtained, with 71% of the recording sites that responded during the icEEG-only session also responding during the icEEG-fMRI session. For all the remaining sites, nearby clusters (distant of 7mm or less) also demonstrated significant HFA increase during the icEEG-fMRI session. Significant HFA increases were also observable at the single-trial level in icEEG-fMRI recordings. Our results show that low-amplitude icEEG signal components such as cognitive-induced HFAs can be reliably recorded with simultaneous fMRI. This paves the way for the use of icEEG-fMRI to address various fundamental and clinical issues, notably the identification of the neural correlates of the BOLD signal.
Today, businesses have to respond with flexibility and speed to ever-changing customer demand and market opportunities. Service-Oriented Architecture (SOA) is the best methodology for developing new services and integrating them with adaptability — the ability to respond to changing and new requirements. In this paper, we propose a framework for ensuring data quality between composite services, which solves semantic data transformation problems during service composition and detects data errors during service execution at the same time. We also minimize the human intervention by learning data constraints as a basis of data transformation and error detection. We developed a data quality assurance service based on SOA, which makes it possible to improve the quality of services and to manage data effectively for a variety of SOA-based applications. As an empirical study, we applied the service to detect data errors between CRM and ERP services and showed that the data error rate could be reduced by more than 30%. We also showed automation rate for setting detection rule is over 41% by learning data constraints from multiple registered services in the field of business.
Learning from imbalanced training data can be a difficult endeavour, and the task is made even more challenging if the data is of low quality or the size of the training dataset is small. Data sampling is a commonly used method for improving learner performance when data is imbalanced. However, little effort has been put forth to investigate the performance of data sampling techniques when data is both noisy and imbalanced. In this work, we present a comprehensive empirical investigation of the impact of changes in four training dataset characteristics — dataset size, class distribution, noise level and noise distribution — on data sampling techniques. We present the performance of four common data sampling techniques using 11 learning algorithms. The results, which are based on an extensive suite of experiments for which over 15 million models were trained and evaluated, show that: (1) even for relatively clean datasets, class imbalance can still hurt learner performance, (2) data sampling, however, may not improve performance for relatively clean but imbalanced datasets, (3) data sampling can be very effective at dealing with the combined problems of noise and imbalance, (4) both the level and distribution of class noise among the classes are important, as either factor alone does not cause a significant impact, (5) when sampling does improve the learners (i.e. for noisy and imbalanced datasets), RUS and SMOTE are the most effective at improving the AUC, while SMOTE performed well relative to the F-measure, (6) there are significant differences in the empirical results depending on the performance measure used, and hence it is important to consider multiple metrics in this type of analysis, and (7) data sampling rarely hurt the AUC, but only significantly improved performance when data was at least moderately skewed or noisy, while for the F-measure, data sampling often resulted in significantly worse performance when applied to slightly skewed or noisy datasets, but did improve performance when data was either severely noisy or skewed, or contained moderate levels of both noise and imbalance.
This paper summarizes content of the workshop focused on data quality. The first speaker (VH) described data quality infrastructure and data quality evaluation methods currently in place within the Observational Data Science and Informatics (OHDSI) consortium. The speaker described in detail a data quality tool called Achilles Heel and latest development for extending this tool. Interim results of an ongoing Data Quality study within the OHDSI consortium were also presented. The second speaker (MK) described lessons learned and new data quality checks developed by the PEDsNet pediatric research network. The last two speakers (JB, RG) described tools developed by the Sentinel Initiative and University of Utah’s service oriented framework. The workshop discussed at the end and throughout how data quality assessment can be advanced by combining best features of each network.
XML, the eXtensible Markup Language, is fast evolving into the new standard for data representation and exchange on the WWW. This has resulted in a growing number of data cleaning techniques to locate "dirty" data (artifacts). In this paper, we present DEEPDETECT – an extensible system that detects attribute outliers and duplicates in XML documents. Attribute outlier detection finds objects that contain deviating values with respect to a relevant group of objects. This entails utilizing the correlation among element values in a given XML document. Duplicate detection in XML requires the identification of subtrees that correspond to real world objects. Our system architecture enables sharing of common operations that prepare XML data for the various artifact detection techniques. DEEPDETECT also provides an intuitive visual interface for the user to specify various parameters for preprocessing and detection, as well as to view results.
In this paper, we propose a comprehensive data integration and quality assurance model, called CHARIOT, for agro-meteorological data. This model comprised of two modules: an intermediary module and a data quality control module. The intermediary provides users with reliable and continuous access to heterogeneous weather databases from various sources; it also solves various compatibility issues in meteorological time series data. The data quality control tool consists of a multi-layer system spanning internal, temporal and spatial data checks. These two modules combined together provide users with clean and error-free inputs for weather-driven agricultural management decisions. When applying CHARIOT to weather data for a real-life agricultural application, CHARIOT is shown to be effective in controlling and improving data quality, which in turn leads to better and more accurate agricultural management decisions.
To face the challenges posed by new techno-savvy market players, the Public Sector Banks (PSB) and the old private banks in India, have introduced Core Banking Solutions (CBS) to replace disparate branch automation systems. CBS provides centralized online banking operational database which can be exploited for building Decision Support System (DSS) in key areas. While promptness of data is ensured, other data quality needs are to be appraised before implementing any such DSS. Hence an assessment of data quality in two key areas – Customer Relationship Management and Borrower Behaviour was carried out for a sample bank for data profiling, inter-field consistency, attribute value dependent constraints, domain constraints. The study has identified critical areas for data quality improvement both for legacy data that has been migrated and new data being captured by the CBS. Measures for data cleaning and implementation of additional constraints at the database or application level are proposed for improvement of data quality for implementing these DSS.
This work introduces a data-driven credibility assessment to quantify simulation quality in industrial part re-manufacturing. The framework evaluates the dependability of sources, data, and methodologies, focusing on robustness and uncertainty for data quality and simulation confidence. A conceptual data model, designed using competency questions, maps data requirements, digitalising credibility evaluation and promoting data traceability and accessibility.