Special Issue on Soft Computing Methods in Artificial IntelligenceNo Access

ON THE COMPARISON OF GENERIC INFORMATION LOSS MEASURES AND CLUSTER-SPECIFIC ONES

Database Laboratory, University of A Coruña, Campus de Elviña, 15071 A Coruña, Galicia, Spain

and

Artificial Intelligence Research Institute (IIIA-CSIC), Institut d'Investigació en Intel·ligència Artificial, Campus UAB s/n, 08193 Bellaterra, Catalonia, Spain

Search for more papers by this author

https://doi.org/10.1142/S0218488508005273Cited by:11 (Source: Crossref)

Abstract

Masking methods are to protect data bases prior to their public release. They mask an original data file so that the new file ensures the privacy of data respondents. Information loss measures have been developed to evaluate in which extent the masked file diverges from the corresponding original file, and in what extent the same analyses on both files lead to the same results.

Generic information loss measures ignore the intended data use of the file. These are the standard measures when data has to be released (e.g. published in the web) and there is no control on what kind of analyses users would perform. In this paper we study generic information loss measures, and we compare such measures with respect to cluster-specific ones. That is, measures specifically defined for the case in which the user will do clustering with the original data. To do so, we define such measures and then we do an extensive comparison of the two measures.

The paper shows that the generic measures can cope with the information loss related to clustering.

References

J. C. Bezdek , Pattern Recognition with Fuzzy Objective Function Algorithms ( Plenum Press , New York , 1981 ) . Crossref, Google Scholar
J. Burridge, Statistics and Computing 13, 321 (2003), DOI: 10.1023/A:1025658621216. Crossref, Web of Science, Google Scholar
R. N. Davé, Pattern Recognition Letters 12, 657 (1991). Crossref, Web of Science, Google Scholar
J. Domingo-Ferrer, J. M. Mateo-Sanz and V. Torra, Comparing SDC methods for microdata on the basis of information loss and disclosure risk, Pre-proceedings of ETK-NTTS'20012 (2001) pp. 807–826. Google Scholar
J. Domingo-Ferrer and V. Torra, Disclosure Control Methods and Information Loss for Microdata, Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (Elsevier Science, 2001) pp. 91–110. Google Scholar
J. Domingo-Ferrer and V. Torra, A Quantitative Comparison of Disclosure Control Methods for Microdata, Confidentiality, Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies (Elsevier Science, 2001) pp. 111–133. Google Scholar
S. Hansen and S. Mukherjee, Transactions on Knowledge and Data Engineering 15(4), 1043 (2003). Crossref, Web of Science, Google Scholar
R. Krishnapuram and J. M. Keller, IEEE Trans. on Fuzzy Systems 1, 98 (1993), DOI: 10.1109/91.227387. Crossref, Web of Science, Google Scholar
R. Lopez de Mantaras, Machine Learning 6, 81 (1991). Crossref, Web of Science, Google Scholar
J. M. Mateo-Sanz, J. Domingo-Ferrer and F. Sebé, Data Mining and Knowledge Discovery 11(2), 181 (2005), DOI: 10.1007/s10618-005-0011-9. Crossref, Web of Science, Google Scholar
S. Mukherjee, Z. Chen and A. Gangopadhyay, The VLDB Journal 15, 293 (2006), DOI: 10.1007/s00778-006-0010-5. Crossref, Web of Science, Google Scholar
A. Oganian and J. Domingo-Ferrer, Statistical J. United Nations Economic Commission for Europe 18(4), 345 (2000). Crossref, Google Scholar
N. R. Pal, K. Pal and J. C. Bezdek, A mixed c-means clustering model, Proc. 6th IEEE Int. Conf. on Fuzzy Systems (1997) pp. 11–21. Google Scholar
Y. Lindell and B. Pinkas, Privacy preserving data mining, Crypto'00, Lecture Notes in Computer Science 1880 (2000) pp. 20–24. Google Scholar
J. Nin, J. Herranz and V. Torra, Data and Knowledge Engineering 64(1), 346 (2008), DOI: 10.1016/j.datak.2007.07.006. Crossref, Web of Science, Google Scholar
V. Torra, Abowd and J. M. Domingo-Ferrer, Using Mahalanobis Distance-Based Record Linkage for Disclosure Risk Assessment, Lecture Notes in Computer Science 4302 (Springer-Verlag, 2006) pp. 233–242. Crossref, Google Scholar
M. Trottini, Decision models for data disclosure limitation, PhD Dissertation, Carnegie Mellon University, 2003, http://www.niss.org/dgii/TR/Thesis-Trottini-final.pdf . Google Scholar
L. Willenborg and T. de Waal , Elements of Statistical Disclosure Control , Lecture Notes in Statistics ( Springer-Verlag , 2001 ) . Crossref, Google Scholar
W. E. Yancey, W. E. Winkler and R. H. Creecy, Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 (2002) pp. 135–152. Crossref, Google Scholar
A. C. Yao, Protocols for Secure Computations, Proc. of 23rd IEEE Symposium on Foundations of Computer Science (1982) pp. 160–164. Google Scholar
CASC: Computational Aspects of Statistical Confidentiality, EU Project, http://neon.vb.cbs.nl/casc/(Test Sets) . Google Scholar