Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Microaggregation is one of the most employed microdata protection methods. It builds clusters of at least k original records, and then replaces these records with the centroid of the cluster. When the number of attributes of the dataset is large, one usually splits the dataset into smaller blocks of attributes, and then applies microaggregation to each block, successively and independently. In this way, the effect of the noise introduced by microaggregation is reduced, at the cost of losing the k-anonymity property.
In this work we show that, besides the specific microaggregation method, the value of the parameter k and the number of blocks in which the dataset is split, there exists another factor which influences the quality of the microaggregation: the way in which the attributes are grouped to form the blocks. When correlated attributes are grouped in the same block, the statistical utility of the protected dataset is higher. In contrast, when correlated attributes are dispersed into different blocks, the achieved anonymity is higher, and so, the disclosure risk is lower. We present quantitative evaluations of such statements based on different experiments on real datasets.
Finding techniques to reduce software developmental effort and produce highly reliable software is an extremely vital goal for software developers. One method that has proven quite useful is the application of software metrics-based classification models. Classification models can be constructed to identify faulty components in a software system with high accuracy. Significant research has been dedicated towards developing methods for improving the quality of software metrics-based classification models. It has been shown in several studies that the accuracy of these models improves when irrelevant attributes are identified and eliminated from the training data set.
This study presents a rough set theory approach, based on classical set theory, for identifying and eliminating irrelevant attributes from a training data set. Rough set theory is used to find small groups of attributes, determined by the relationships that exist between the objects in a data set, with comparable discernibility as larger sets of attributes. This allows for the development of simpler classification models that are easy for analyst to understand and explain to others.
We built case-based reasoning models in order to evaluate their classification performance on the smaller subsets of attributes selected using rough set theory. The empirical studies demonstrated that by applying a rough set approach to find small subsets of attributes we can build case-based reasoning models with an accuracy comparable to, and in some cases better than, a case-based reasoning model built with a complete set of attributes.
In this paper, we consider a system in which knowledge in a dispersed form is available. In the system local classifiers are combined into coalitions. Two methods of combining classifiers in coalitions are discussed in this paper — with a hierarchical agglomeration algorithm and with Pawlak’s conflict model. The purpose of this paper is to apply methods for reducing dimensionality in these two approaches. Two methods of attribute reduction are considered — based on the rough set theory and based on attribute correlation with decision class. The most important conclusions formulated in the paper are as follows. The use of attribute selection method improves the quality of classification of the dispersed system. Better results are generated by the system with a hierarchical agglomeration algorithm.
Young people’s skills and attitudes must be observed, studied and evaluated in order to create appropriate models that would serve an educational purpose. In this paper, we propose a decision-making model with the aim to detect certain attitudinal and behavioral patterns of actively engaged young people. The data used in this research resulted from a questionnaire drawn up by a group of researchers from six European countries with the aim to investigate the youth’s awareness about the Sustainable Development Goals and their engagement as active agents of development and change at regional level. For the purpose of this study, we selected the regional results obtained from administering this questionnaire in Dolj County (Romania) and La Rioja (Spain). We developed a neutrosophic model that determines the Mindchanger profile of the respondents based on a minimal set of questions, which is dependent on the inclusion or exclusion of the nationality-specific traits. The resulting decisions were then compared with the respondents’ self-evaluations, yielding high precisions (more than 0.83) for all the investigated evaluation scenarios. Our results were significantly better than the ones provided by several machine learning models applied on the same set of data. A direct impact of our model is that it can be applied to questionnaires which include linguistic responses that express, among others, unclear or vague thoughts. Additionally, it offers the possibility to identify the minimal set of questions that impact the respondent’s answer choice to a target question.
Alzheimer’s disease (AD) predominantly affects the elderly population with symptoms including, but not limited to, cognitive impairment and memory loss. Predicting AD and mild cognitive impairment (MCI) can lengthen the lifespan of patients and help them to access necessary medical resources. One potential approach to achieve an early diagnosis of AD is to use data mining techniques which explore various characteristic traits related to MCI, cognitively normal (CN), and AD subjects to build classifiers that reveal important contributors to the disease. These classifiers are used by physicians during the AD diagnostic process in a clinical evaluation. In this research, we compare between different data mining algorithms through empirical data approach to deal with the AD diagnosis. Experimental evaluation, using attribute selection methods, and classifiers from rule induction and other classification techniques have been conducted on data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI-MERGE). The results illustrate the good classification performance of classifiers with rules in predicting AD.