Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Supervised feature extraction is used in data dassification and, unlike unsupervised feature extraction, uses class labels to evaluate the quality of the extracted features. It can be computationally inefficient to perform exhaustive searches to find optimal subsets of features. This article proposes an algorithm of supervised linear feature extraction based on the use of multivariate decision trees. The main motivation in proposing this new approach to feature extraction is to decrease computation time required to induce new classifiers required to evaluate every new subset of features. The new feature extraction algorithm proposed here uses an approach similar to the wrapper model method used in feature selection. In order to evaluate the performance of proposed algorithm, several tests with real world data have been performed. The fundamental importance of this new feature extraction method is found in its ability to significantly reduce computational time required to extract features from large database.
Feature Selection is an important task within Text Categorization, where irrelevant or noisy features are usually present, causing a lost in the performance of the classifiers. Feature Selection in Text Categorization has usually been performed using a filtering approach based on selecting the features with highest score according to certain measures. Measures of this kind come from the Information Retrieval, Information Theory and Machine Learning fields. However, wrapper approaches are known to perform better in Feature Selection than filtering approaches, although they are time-consuming and sometimes infeasible, especially in text domains. However a wrapper that explores a reduced number of feature subsets and that uses a fast method as evaluation function could overcome these difficulties. The wrapper presented in this paper satisfies these properties. Since exploring a reduced number of subsets could result in less promising subsets, a hybrid approach, that combines the wrapper method and some scoring measures, allows to explore more promising feature subsets. A comparison among some scoring measures, the wrapper method and the hybrid approach is performed. The results reveal that the hybrid approach outperforms both the wrapper approach and the scoring measures, particularly for corpora whose features are less scattered over the categories.