World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Improving Domain Dictionary-Based Text Categorization Using Self-Partition Model

    https://doi.org/10.1142/S0219427905001304Cited by:1 (Source: Crossref)

    In this paper, we present a novel model for improving the performance of Domain Dictionary-based text categorization. The proposed model is named as Self-Partition Model (SPM). SPM can group the candidate words into the predefined clusters, which are generated according to the structure of Domain Dictionary. Using these learned clusters as features, we proposed a novel text representation. The experimental results show that the proposed text representation-based text categorization system performs better than the Domain Dictionary-based text categorization system. It also performs better than the system based on Bag-of-Words when the number of features is small and the training corpus size is small.

    This research was supported in part by the National Natural Science Foundation of China and Microsoft Asia Research (No. 60203019), the National Natural Science Foundation of China (No. 60473140) and the Key Project of the Chinese Ministry of Education (No. 104065).