STATISTICAL DEPENDENCE AS THE BASIS FOR A PRIVACY MEASURE FOR MICRODATA RELEASE
Abstract
Government agencies and other organizations commonly release or share microdata for purposes of analysis. In many cases, microdata release needs to preserve the privacy of individuals and/or sensitive attributes. Current measures of privacy of released microdata are often based on empirical assessments of identity and value disclosure. The disadvantage of empirical assessments of privacy is that their results cannot be generalized with confidence across datasets or protection methods. While theoretical definitions of privacy are available for other methods of data release such as query-response output perturbation systems, they are unsuitable for the microdata release context. This study proposes a theoretical basis for measuring privacy in the microdata release context based on statistical dependence. Using this theoretical basis, we develop practical privacy measures that possess several desirable properties, including generalizability. We illustrate the conceptual benefits of this approach and also show that a privacy measure based on statistical dependence can be used effectively for assessing privacy in microdata