World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

A Flexible Rule-Based Approach to Learn Medical English-Chinese OOV Term Translations from the Web

    https://doi.org/10.1142/S1793840612400132Cited by:1 (Source: Crossref)

    Out-of-vocabulary (OOV) terms, which do not exist in most dictionaries, usually cause failures in a cross language information retrieval (CLIR) system. Most existing approaches achieve a high performance when using web-mining to translate name entity type OOV terms. However, these methods gain a low performance when they are applied to medical OOV terms because they contain non-Chinese characters which are normally ignored by existing approaches, such as symbols, Roman alphabets and Arabic numbers. This paper presents a flexible rule-based approach towards the acquisition of medical OOV term translation. Our method uses a combination of a novel rule-based pattern extraction and brute force generation to identify the part of non-Chinese characters. To cope with the time-consuming task of ranking list and human extraction of OOV term translation, this paper presents a machine learning method to select correct translations automatically. In the method, twenty-one different features for each Chinese translation candidate are extracted, and the correct Chinese translations are selected by machine learning with our newly proposed statistics filter. By testing our method with 1,654 English ICD9 medical OOV terms, our proposed method (SF+F+W+B+P+S with the base machine learning algorithm SVM) outperforms the existing methods with a recall and precision value of 83.05% and 79.72%, respectively.