Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    Clustering-Based Topical Web Crawling for Topic-Specific Information Retrieval Guided by Incremental Classifier

    Today more and more information on the Web makes it difficult to get domain-specific information due to the huge amount of data sources and the keywords that have few features. Anchor texts, which contain a few features of a specific topic, play an important role in domain-specific information retrieval, especially in Web page classification. However, the features contained in anchor texts are not informative enough. This paper presents a novel incremental method for Web page classification enhanced by link-contexts and clustering. Directly applying the vector of anchor text to a classifier might not get a good result because of the limited amount of features. Link-context is used first to obtain the contextual information of the anchor text. Then, a hierarchical clustering method is introduced to cluster feature vectors and content unit, which increases the length of a feature vector belonging to one specific class. Finally, incremental SVM is proposed to get the final classifier and increase the accuracy and efficiency of a classifier. Experimental results show that the performance of our proposed method outperforms the conventional topical Web crawler in Harvest rate and Target recall.

  • articleNo Access

    Generalization bounds of incremental SVM

    Incremental learning is one of the effective methods of learning from the accumulated training samples and the large-scale dataset. The main advantages of incremental learning consist of making full use of historical information, reducing the training scale greatly and saving space and time consumption. Despite extensive research on incremental support vector machine (SVM) learning algorithms, most of them are based on independent and identically distributed samples (i.i.d.). Not only that, there has been no theoretical analysis of incremental SVM learning algorithms. In this paper, we mainly study the generalization bounds of this incremental SVM learning algorithm whose samples are based on uniformly geometric Markov chains, and exponentially strongly mixing sequence. As a special case, we also obtain the generalization bounds of i.i.d. samples.