World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

A Query Engine for Retrieving Information from Chinese HTML Documents

    https://doi.org/10.1142/S0219427904001085Cited by:0 (Source: Crossref)

    The amount of online information in Chinese and the number of Chinese Internet users have been increasing tremendously during the past decade. Since Chinese language is significantly different from English, techniques that have been developed for retrieving information from English Web documents cannot be directly applied to retrieve information from Chinese Web documents. In order to provide high-performance access of Chinese information on the Web, we have developed a Chinese Web query engine that (i) extracts (hierarchical) data of interest from Chinese HTML tables using an information extraction tool called semantic hierarchy, (ii) allows the user to submit queries in Chinese using a menu-driven user interface, and (iii) processes the user's queries (as Boolean expressions) to generate the correct results. Our query engine supports various groups of information that are categorized into various subject areas, such as car ads, house rentals, job ads, stocks, university catalogs, etc. We have tested our information extraction tool on two application domains, car-ads and house-rental. The average F-measure on extracting Chinese data from these two application domains is above 90%. More importantly, our query engine can easily be configured and internationalized to become a worldwide, multilingual query engine with minor changes in system settings on PCs running Windows operating systems.