World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

A PARALLEL AND MODULAR PATTERN CLASSIFICATION FRAMEWORK FOR LARGE-SCALE PROBLEMS

    https://doi.org/10.1142/9789814273398_0032Cited by:4 (Source: Crossref)
    Abstract:

    The number of samples that are available on the internet to train pattern classifiers is increasing rapidly, while traditional pattern classification techniques based on a single computer system are powerless to process these large-scale data sets. This chapter presents a parallel and modular pattern classification framework for coping with large-scale pattern classification problems. The proposed framework follows a divide-and-conquer strategy that easily assigns a given large-scale problem to an available parallel and distributed computing infrastructure. The framework consists of three independent parts: decomposing training data sets, training component classifiers in a parallel way, and combining trained component classifiers. In order to evaluate the performance of the proposed framework, we perform experiments on a large-scale Japanese patent classification problem, containing about 3,500,000 patent documents. The experimental results show that our framework has the following attractive features: (a) The framework is general, and therefore any traditional pattern classification techniques such as support vector machines can be easily embedded in the framework as component classifiers. (b) The framework can incorporate explicit domain or prior knowledge into learning through the process of dividing training data sets. (c) The framework has good scalability and is easily implementable in hardware.