Advanced Search

Narrow Results

Results: 1 - 9of9

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: H And He Impact (1)	27 Mar 2025	Run
Keyword: Venture Capital (34)	27 Mar 2025	Run
Keyword: Avian Influenza (14)	27 Mar 2025	Run
Keyword: Transliteration (9)	27 Mar 2025	Run
Keyword: Prenatal Enamel (2)	27 Mar 2025	Run

articleNo Access
Language identification framework in code-mixed social media text based on quantum LSTM — the word belongs to which language?
Modern Physics Letters B07 Feb 2020
Preview Abstract
Machine learning (ML) architectures based on neural model have garnered considerable attention in the field of language classification. Code-mixing is a common phenomenon on social networking sites for exhibiting opinion on a topic. The code-mixed text is the approach of mixing two or more languages. This paper describes the application of the code-mixed index in Indian social media texts and compares the complexity to identify language at the word level using Bi-directional Long Short-Term Memory model. The major contribution of the work is to propose a technique for identifying the language of Hindi–English code-mixed data used in three social media platforms namely, Facebook, Twitter and WhatsApp. Here, we demonstrate that a special class of quantum LSTM network model is capable of learning and accurately predicting the languages used in social media texts. Our work paves the way for future applications of machine learning methods in quantum dynamics without relying on the explicit form of the Hamiltonian.
articleNo Access
Phonetic-Based Forward Online Transliteration Tool from English to Tamil Language
International Journal of Reliability, Quality and Safety Engineering19 Apr 2023
Preview Abstract
Transliteration is the process of mapping the character of one language to the character of some other language based on its phonetics. India is very much diverse in languages where people speak different languages. Though they speak different languages, it might be difficult for them to read the script of those many languages. In a situation like this, transliteration process plays a major role. It helps in various Natural Language Processing applications such as Information retrieval, Machine translation, Speech recognition. These are NLP applications which make the computer understand the natural language as to how human being interprets. It helps in translating technical terms and proper names from one language to another language. Moreover, transliteration works have been carried out in languages such as Japanese, Chinese and English. But when considering Indian languages, especially Tamil language, very few recognizable works have been carried out. In this paper, transliteration process is carried out on Unicode Tamil characters. The phonetics-based forward list processing is implemented for transliterating from English language to Tamil language which yields promising results.
articleNo Access
Two Approaches for the Resolution of Word Mismatch Problem Caused by English Words and Foreign Words in Korean Information Retrieval
- BYUNG-JU KANG and
- KEY-SUN CHOI
International Journal of Computer Processing of Languages01 Jun 2001
Preview Abstract
In Korean text these days, the use of English words with or without phonetic translations are growing at a high speed. To make matters worse, the Korean transliteration of an English word may vary greatly. The mixed use of English words and their various transliterations in the same document or document collection may cause severe word mismatch problems in Korean information retrieval. There are two possible approaches to tackle this problem: transliteration and back-transliteration method. We argue that our newly proposed transliteration approach is more advantageous for the resolution of the word mismatch problem than the previously proposed back-transliteration approach. Our information retrieval experiment results support this argument.
articleNo Access
Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs from Bilingual Corpora
- KEITA TSUJI
International Journal of Computer Processing of Languages01 Sep 2002
Preview Abstract
A method to automatically extract translational Japanese-KATAKANA and English word pairs from bilingual corpora is proposed. The method applies all the existing back-transliteration rules to each mora unit in a KATAKANA word, and extracts the English word which matches or partially matches one of these back-transliteration candidates as translation. The mora unit is a Japanese syllable unit and one Katakana character often corresponds to one mora. For instance, if we have in the Japanese part of a bilingual corpora, we generate such back-transliteration candidates as <graf>, <graph>, <gulerph>,… and identify similar words from the English part of the corpora. The method performs reasonably well, achieving 80%–100% precision at 75% recall against the eight corpora we used for evaluation.
articleNo Access
A Statistical Model for Automatic Extraction of Korean Transliterated Foreign Words
- JONG-HOON OH and
- KEY-SUN CHOI
International Journal of Computer Processing of Languages01 Mar 2003
Preview Abstract
In this paper, we will describe a Korean transliterated foreign word extraction algorithm. In the proposed method, we reformulate the foreign word extraction problem as a syllable-tagging problem such that each syllable is tagged with a foreign syllable tag or a pure Korean syllable tag. Syllable sequences of Korean strings are modelled by Hidden Markov Model whose state represents a character with binary marking to indicate whether the syllable is part of a transliterated foreign word or not. The proposed method extracts a transliterated foreign word with high recall rate and precision rate. Moreover, our method shows good performance even with small-sized training corpora.
articleNo Access
Producing Algorithmically Standard Romanization of Arabic Names Using Hints from Non-Standards
- FAWAZ S. AL-ANZI
International Journal of Computer Processing of Languages01 Sep 2004
Preview Abstract
This article addresses the problem of standard Romanization of Arabic names using undiacritized-Arabic forms and their corresponding non-standard Romanization. The Romanization of Arabic names has long been studied and standardized. Huge amounts of non-standard Arabic databases of Romanized names exist that are in use in many private and government agencies. Examples of such applications are passport name holder databases, phone directories, and geographic names databases. Dealing with such databases can be inefficient and can produce inconsistent results. Converting such databases into their standard Romanization can help in solving these problems.
In this paper, we present an efficient algorithmic software implementation which produces standard Romanization of Arabic alphabet name presentation by utilizing the hints in the existing non-standard Romanized databases. The results of the software implementation have proven to be very promising.
articleNo Access
Transliteration Using a Network of Phoneme Chunks
- IN-HO KANG and
- GIL CHANG KIM
International Journal of Computer Processing of Languages01 Mar 2005
Preview Abstract
In this paper, we present methods of transliteration and back-transliteration. In Korean technical documents and web documents, many English words and Japanese words are transliterated into Korean words. These transliterated words are usually technical terms and proper nouns, so it is hard to find them in a dictionary. Therefore an automatic transliteration system is needed. Previous transliteration models restrict an information length to two or three letters per letter. However, most transliteration phenomena cannot be explained with a single standard rule especially in Korean. Various rules such as the origin of a word and profession of users are applied to each transliteration. The restriction of information length may lose the discriminative information of each transliteration rule. In this paper, we propose the methods that find similar words which have the longest overlap with an input word. To find similar words without the loss of each transliteration rule, phoneme chunks that do not have a length limit are used. By merging phoneme chunks, an input word is transliterated. With our proposed method, we could get 86% character accuracy and 53% word accuracy in an English-to-Korean transliteration test.
articleNo Access
Transliterated Named Entity Recognition Based on Chinese Word Sketch
- PETR ŠIMON,
- CHU-REN HUANG,
- SHU-KAI HSIEH, and
- JIA-FEI HONG
International Journal of Computer Processing of Languages01 Mar 2008
Preview Abstract
One unique challenge in Chinese Language Processing is cross-strait named entity recognition. Due to the adoption of different transliteration strategies, foreign name transliterations can vary greatly between the PRC and Taiwan, creating difficulties in NLP tasks including data mining, translation and information retrieval. In this paper, we introduce a novel approach to automatic extraction of divergent transliterations of foreign named entities that bootstraps co-occurrence statistics from tagged Chinese corpora, thereby producing higher precision.
articleNo Access
A Comparison between Rule and Dictionary Based Romanization of Arabic Names
International Journal of Computer Processing of Languages01 Jun 2012
Preview Abstract
Romanization is used to phonetically translate names and technical terms from languages in non-Roman alphabets to languages in Roman alphabets. Because almost all dictionaries contain standard English forms for some Arabic names, this problem has been solved using machine transliteration. Several programs exist to deal with transliteration; they are based either on dictionary-based approach or on rule-based approach. In this study, a comparison between these two approaches is shown. Test data from the Yarmouk University library were used. Results show that while a rule-based Romanizer can romanize all names, a dictionary-based Romanizer romanizes (86%) of tested names. On the other hand, another kind of test was performed over the Romanization rules used by each Romanizer; the results show that the Romanization rules (in terms of accuracy and usability) used by the Dictionary-based Romanizer used in this study are better than the ones used by Rule-based Romanizer.

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Narrow Results

Filters Applied

Publication Type

Article Type

Publication Date

Author

Publication/Book Series

Subjects

Access

Language identification framework in code-mixed social media text based on quantum LSTM — the word belongs to which language?

Phonetic-Based Forward Online Transliteration Tool from English to Tamil Language

Two Approaches for the Resolution of Word Mismatch Problem Caused by English Words and Foreign Words in Korean Information Retrieval

Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs from Bilingual Corpora

A Statistical Model for Automatic Extraction of Korean Transliterated Foreign Words

Producing Algorithmically Standard Romanization of Arabic Names Using Hints from Non-Standards

Transliteration Using a Network of Phoneme Chunks

Transliterated Named Entity Recognition Based on Chinese Word Sketch

A Comparison between Rule and Dictionary Based Romanization of Arabic Names