Please login to be able to save your searches and receive alerts for new content matching your search criteria.
This article addresses the problem of standard Romanization of Arabic names using undiacritized-Arabic forms and their corresponding non-standard Romanization. The Romanization of Arabic names has long been studied and standardized. Huge amounts of non-standard Arabic databases of Romanized names exist that are in use in many private and government agencies. Examples of such applications are passport name holder databases, phone directories, and geographic names databases. Dealing with such databases can be inefficient and can produce inconsistent results. Converting such databases into their standard Romanization can help in solving these problems.
In this paper, we present an efficient algorithmic software implementation which produces standard Romanization of Arabic alphabet name presentation by utilizing the hints in the existing non-standard Romanized databases. The results of the software implementation have proven to be very promising.
Increasing amount of paper documents are produced and received by many organizations. Frequently, they have to be digitized for electronic archiving and later information retrieval or data mining, requiring scanning and OCR. Since OCR techniques are language dependent, the language of the original document must be identified first by advanced technology. This paper describes two methods of identifying Oriental languages among four language groups, i.e. Oriental, Roman, Cyrillic, and Arabic. One method is based on features extracted from the shapes of words and letters, while the other is based on global analysis of text pieces using Gabor filters. Experimental results on hundreds of both clean and noisy documents indicate that the proposed classification approaches look quite promising. The use of linguistic analysis to enhance the results is also discussed.
Compared to other major languages of the world, the Arabic language suffers from a dearth of research initiatives and research resources. As a result, Modern Standard Arabic (MSA) lacks reliable speech corpora for research in phonetics and related areas of linguistics. In recent years the Linguistic Data Consortium (LDC) published the first public MSA speech corpus designed for speech recognition experiments. That corpus was called West Point. Currently, we are using this corpus in our research experiments for speech recognition and other speech processing investigations. The aim of this paper is to evaluate the West Point Corpus from the MSA phonetic and linguistic point of view. The phonemes used and their numbers, the phoneme definitions, the labeling, and the scripts established by the West Point Corpus are included in the evaluation. Weaknesses, strengths, and discrepancies of the West Point Corpus regarding the linguistic rules and phonetic characteristics of MSA are also discussed in this paper.
This article focuses on the development of Natural Language Processing (NLP) tools for Computer Assisted Language Learning (CALL). First, we have developed some NLP tools: a labelled dictionary of Arabic (as complete as possible), a generator for morphological derivatives, a Conjugator and a morphological analyzer for Arabic. Second, we used these tools to create a number of educational applications for learning the Arabic language by using the proposed system SALA (an NLP-based authoring system, organized into three distinct layers: functions, scripts and activities).
This article focuses on the development of Natural Language Processing (NLP) tools for Computer Assisted Language Learning (CALL). First, we have developed some NLP tools: a labelled dictionary of Arabic (as complete as possible), a generator for morphological derivatives, a Conjugator and a morphological analyzer for Arabic. Second, we used these tools to create a number of educational applications for learning the Arabic language by using the proposed system SALA (an NLP-based authoring system, organized into three distinct layers: functions, scripts and activities).