World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

A HYBRID FEATURE SELECTION METHOD FOR TEXT CATEGORIZATION

    https://doi.org/10.1142/S0218488507004492Cited by:4 (Source: Crossref)

    Feature Selection is an important task within Text Categorization, where irrelevant or noisy features are usually present, causing a lost in the performance of the classifiers. Feature Selection in Text Categorization has usually been performed using a filtering approach based on selecting the features with highest score according to certain measures. Measures of this kind come from the Information Retrieval, Information Theory and Machine Learning fields. However, wrapper approaches are known to perform better in Feature Selection than filtering approaches, although they are time-consuming and sometimes infeasible, especially in text domains. However a wrapper that explores a reduced number of feature subsets and that uses a fast method as evaluation function could overcome these difficulties. The wrapper presented in this paper satisfies these properties. Since exploring a reduced number of subsets could result in less promising subsets, a hybrid approach, that combines the wrapper method and some scoring measures, allows to explore more promising feature subsets. A comparison among some scoring measures, the wrapper method and the hybrid approach is performed. The results reveal that the hybrid approach outperforms both the wrapper approach and the scoring measures, particularly for corpora whose features are less scattered over the categories.