World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Assessing Data Quality Within Available Context

    The work was partially supported by the National Natural Science Foundation of China under Grant No. 60573164 , and by SRF for ROCS, SEM.

    https://doi.org/10.1142/9789814273497_0004Cited by:1 (Source: Crossref)
    Abstract:

    Data quality rating is an important issue to be considered in many scenarios such as data integration, cooperative information system(CIS). Now it is widely accepted that data quality can be measured from multiple dimensions such as accuracy, completeness etc. Most of the work focuses on how to qualitatively analyze the dimensions and the analysis will greatly depend on experts' knowledge. Seldom work is given on how to automatically quantify data quality dimensions. To solve this challenging problem, we propose a novel approach to automatically Quantify Dimensions within Context(QDC). Data quality can be gauged by discrepancy between data view and its entity's perfect representation. Since it is difficult to obtain the perfect representation of entity, we propose to approximate the perfect representation within its available context and quality dimensions can be quantified in this context scope. By naturally borrowing entropy concepts from information theory, the measurement is easily given for different types of data. In this way the two most import quality dimensions, that are accuracy and completeness, are properly quantified. Our QDC approach can not only give an objective score and ranking in a cooperative multi-source environment but also avoid human's laborious interaction. As an automatic quality rating solution our approach is distinguished, especially for large scale datasets. Theory and experiment shows our approach performs well for quality rating.