World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

CHINESE WORD SEARCHING IN IMAGED DOCUMENTS

    https://doi.org/10.1142/S0218001404003137Cited by:15 (Source: Crossref)

    An approach to searching for user-specified words in imaged Chinese documents, without the requirements of layout analysis and OCR processing of the entire documents, is proposed in this paper. A small number of Chinese characters that cannot be successfully bounded using connected component analysis due to larger gaps between elements within the characters are blacklisted. A suitable character that is not included in the blacklist is chosen from the user-specified word as the initial character to search for a matching candidate in the document. Once a matched candidate is found, the adjacent characters in the horizontal and vertical directions are examined for matching with other corresponding characters in the user-specified word, subject to the constraints of alignment (either horizontal or vertical direction) and size similarity. A weighted Hausdorff distance is proposed for the character matching. Experimental results show that the present method can effectively search the user-specified Chinese words from the document images with the format of either horizontal or vertical text lines, or both appearing on the same image.