Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Microaggregation is a technique for the protection of the confidentiality of respondents in microdata releases. It is used for economic data where respondent identifiability is high. Microaggregation releases the averages of small groups in which no single respondent is dominant. It was developed for univariate data. The data was sorted and the averages of adjacent fixed size groups were reported. The groups can be allowed to have varying sizes so that no group will include a large gap in the sorted data. The groups become more homogeneous when their boundaries are sensitive to the distribution of the data. This is like clustering but with the number of clusters chosen to be as large as possible subject to homogeneous clusters and a minimum cluster size. Approximate methods based on comparisons are developed. Exact methods based on linear optimization are also developed. For bivariate, or higher dimensional, data the notion of adjacency is defined even though sorting is no longer well defined. The constraints for minimum cluster size are also more elaborate and not so easily solved. We may also use only a triangulation to limit the number of adjacencies to be considered in the algorithms. Hybrids of the approximate and exact methods combine the strengths of each strategy.
Microaggregation is a statistical disclosure control technique. Raw microdata (i.e. individual records) are grouped into small aggregates prior to publication. With fixed-size groups, each aggregate contains k records to prevent disclosure of individual information. Individual ranking is a usual criterion to reduce multivariate microaggregation to univariate case: the idea is to perform microaggregation independently for each variable in the record. Using distributional assumptions, we show in this paper how to find interval estimates for the original data based on the microaggregated data. Such intervals can be considerably narrower than intervals resulting from subtraction of means, and can be useful to detect lack of security in a microaggregated data set. Analytical arguments given in this paper confirm recent empirical results about the unsafety of individual ranking microaggregation.
Microaggregation is one of the most employed microdata protection methods. It builds clusters of at least k original records, and then replaces these records with the centroid of the cluster. When the number of attributes of the dataset is large, one usually splits the dataset into smaller blocks of attributes, and then applies microaggregation to each block, successively and independently. In this way, the effect of the noise introduced by microaggregation is reduced, at the cost of losing the k-anonymity property.
In this work we show that, besides the specific microaggregation method, the value of the parameter k and the number of blocks in which the dataset is split, there exists another factor which influences the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.