Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    Automatic Identification of Oriental and Other Scripts in Image Documents

    Increasing amount of paper documents are produced and received by many organizations. Frequently, they have to be digitized for electronic archiving and later information retrieval or data mining, requiring scanning and OCR. Since OCR techniques are language dependent, the language of the original document must be identified first by advanced technology. This paper describes two methods of identifying Oriental languages among four language groups, i.e. Oriental, Roman, Cyrillic, and Arabic. One method is based on features extracted from the shapes of words and letters, while the other is based on global analysis of text pieces using Gabor filters. Experimental results on hundreds of both clean and noisy documents indicate that the proposed classification approaches look quite promising. The use of linguistic analysis to enhance the results is also discussed.