No Access

Assessing Data Quality Within Available Context

The work was partially supported by the National Natural Science Foundation of China under Grant No. 60573164 , and by SRF for ROCS, SEM.

Jingyu Han

Nanjing University of Posts and Telecommunications, Nanjing, 210003, China

Search for more papers by this author

Dawei Jiang

Southeast University, Nanjing, 210096, China

Search for more papers by this author

, and

Zhiming Ding

Institute of Software, Chinese Academy of Science, Beijing, 100080, China

Search for more papers by this author

https://doi.org/10.1142/9789814273497_0004Cited by:1 (Source: Crossref)

Abstract:

Data quality rating is an important issue to be considered in many scenarios such as data integration, cooperative information system(CIS). Now it is widely accepted that data quality can be measured from multiple dimensions such as accuracy, completeness etc. Most of the work focuses on how to qualitatively analyze the dimensions and the analysis will greatly depend on experts' knowledge. Seldom work is given on how to automatically quantify data quality dimensions. To solve this challenging problem, we propose a novel approach to automatically Quantify Dimensions within Context(QDC). Data quality can be gauged by discrepancy between data view and its entity's perfect representation. Since it is difficult to obtain the perfect representation of entity, we propose to approximate the perfect representation within its available context and quality dimensions can be quantified in this context scope. By naturally borrowing entropy concepts from information theory, the measurement is easily given for different types of data. In this way the two most import quality dimensions, that are accuracy and completeness, are properly quantified. Our QDC approach can not only give an objective score and ranking in a cooperative multi-source environment but also avoid human's laborious interaction. As an automatic quality rating solution our approach is distinguished, especially for large scale datasets. Theory and experiment shows our approach performs well for quality rating.

Data Quality and High-Dimensional Data Analysis

Metrics

Downloaded 7 times

History

PDF download

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Assessing Data Quality Within Available Context

Recommended