Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    SEGMENTATION-FREE ONLINE ARABIC HANDWRITING RECOGNITION

    Arabic script is naturally cursive and unconstrained and, as a result, an automatic recognition of its handwriting is a challenging problem. The analysis of Arabic script is further complicated in comparison to Latin script due to obligatory dots/stokes that are placed above or below most letters. In this paper, we introduce a new approach that performs online Arabic word recognition on a continuous word-part level, while performing training on the letter level. In addition, we appropriately handle delayed strokes by first detecting them and then integrating them into the word-part body. Our current implementation is based on Hidden Markov Models (HMM) and correctly handles most of the Arabic script recognition difficulties. We have tested our implementation using various dictionaries and multiple writers and have achieved encouraging results for both writer-dependent and writer-independent recognition.

  • articleNo Access

    Producing Algorithmically Standard Romanization of Arabic Names Using Hints from Non-Standards

    This article addresses the problem of standard Romanization of Arabic names using undiacritized-Arabic forms and their corresponding non-standard Romanization. The Romanization of Arabic names has long been studied and standardized. Huge amounts of non-standard Arabic databases of Romanized names exist that are in use in many private and government agencies. Examples of such applications are passport name holder databases, phone directories, and geographic names databases. Dealing with such databases can be inefficient and can produce inconsistent results. Converting such databases into their standard Romanization can help in solving these problems.

    In this paper, we present an efficient algorithmic software implementation which produces standard Romanization of Arabic alphabet name presentation by utilizing the hints in the existing non-standard Romanized databases. The results of the software implementation have proven to be very promising.

  • articleNo Access

    Automatic Identification of Oriental and Other Scripts in Image Documents

    Increasing amount of paper documents are produced and received by many organizations. Frequently, they have to be digitized for electronic archiving and later information retrieval or data mining, requiring scanning and OCR. Since OCR techniques are language dependent, the language of the original document must be identified first by advanced technology. This paper describes two methods of identifying Oriental languages among four language groups, i.e. Oriental, Roman, Cyrillic, and Arabic. One method is based on features extracted from the shapes of words and letters, while the other is based on global analysis of text pieces using Gabor filters. Experimental results on hundreds of both clean and noisy documents indicate that the proposed classification approaches look quite promising. The use of linguistic analysis to enhance the results is also discussed.

  • articleNo Access

    Handwritten Farsi Word Recognition Using NN-Based Fusion of HMM Classifiers with Different Types of Features

    In this paper, an off-line method, based on hidden Markov model, HMM, is used for holistic recognition of handwritten words of a limited vocabulary. Three feature sets based on image gradient, black–white transition and contour chain code are used. For each feature set an HMM is trained for each word. In the recognition step, the outputs of these classifiers are combined through a multilayer perceptron, MLP. High number of connections in this network causes a computational complexity in the training. To avoid this problem, a new method is proposed. In the experiments on 16000 images of 200 names of Iranian cities, from “Iranshahr 3” dataset, the results of the proposed method are presented and compared with some similar methods. An error analysis on these results is also provided.

  • articleNo Access

    HOW DO BILINGUALS HANDLE INTERHEMISPHERIC INTEGRATION? EVIDENCE FROM A CROSS-LANGUAGE STUDY

    The focus on interhemispheric interaction and integration has become a prominent aspect of laterality research. The aim of the present behavioral study was to determine whether hemisphere advantage differs between language groups. This was done by comparing how hemisphere advantage affects interhemispheric integration in monolingual and in bilingual individuals. Sixty university students (20 English monolinguals, 20 Hebrew bilinguals, and 20 balanced Arabic bilinguals) participated in two experiments, in which a lexical decision task was performed in the left and/or right visual field. Stimuli were presented unilaterally and bilaterally, whereby participants were cued to respond to the stimuli. In Experiment 1, all three groups showed an effect of lexicality, that is, participants responded to word stimuli faster than to non-word stimuli, with the Hebrew and Arabic groups showing a word advantage in spotting errors. In addition, all groups except the Hebrew group showed the expected right visual field advantage in accuracy, and the English group demonstrated this advantage in reaction time as well. In Experiment 2, responses to non-word stimuli were equally accurate in the left and right visual fields, but reaction time were faster for stimuli presented in the left visual field. The performance of balanced bilingual Arabic and unbalanced bilingual Hebrew reading groups was significantly better in the bilateral condition than in the unilateral condition. The results supported the notion that bilingual individuals show more effective interhemispheric communication and that they enjoy relative superiority in their interhemispheric processing in response to task demands.

  • articleNo Access

    Application of Support Vector Machine for Arabic Sentiment Classification Using Twitter-Based Dataset

    Sentiment classification is the process of classifying emotions and opinions in texts. In this study, the problem of Arabic sentiment analysis was addressed. A support vector machine (SVM) model was proposed to classify opinions in Arabic micro-texts as being positive or negative. To evaluate the performance of the SVM model, a dataset was built from tweets discussing several social issues in Saudi Arabia. These issues include changes that were implemented by the country as part of a newly established vision, known as Saudi Arabia Vision 2030. The constructed dataset was manually annotated according to the sentiment conveyed in the text. To achieve the best sentiment classification accuracy, several procedures were implemented within the proposed framework including light stemming, feature extraction (Ngrams, emoji and tweet-topic features), parameter optimisation and feature-set reduction. The experimental results revealed excellent outcomes. An accuracy of 89.83% was achieved using the proposed SVM model.

  • articleNo Access

    Evaluating the MSA West Point Speech Corpus

    Compared to other major languages of the world, the Arabic language suffers from a dearth of research initiatives and research resources. As a result, Modern Standard Arabic (MSA) lacks reliable speech corpora for research in phonetics and related areas of linguistics. In recent years the Linguistic Data Consortium (LDC) published the first public MSA speech corpus designed for speech recognition experiments. That corpus was called West Point. Currently, we are using this corpus in our research experiments for speech recognition and other speech processing investigations. The aim of this paper is to evaluate the West Point Corpus from the MSA phonetic and linguistic point of view. The phonemes used and their numbers, the phoneme definitions, the labeling, and the scripts established by the West Point Corpus are included in the evaluation. Weaknesses, strengths, and discrepancies of the West Point Corpus regarding the linguistic rules and phonetic characteristics of MSA are also discussed in this paper.

  • articleNo Access

    Use of NLP Tools in CALL System for Arabic

    This article focuses on the development of Natural Language Processing (NLP) tools for Computer Assisted Language Learning (CALL). First, we have developed some NLP tools: a labelled dictionary of Arabic (as complete as possible), a generator for morphological derivatives, a Conjugator and a morphological analyzer for Arabic. Second, we used these tools to create a number of educational applications for learning the Arabic language by using the proposed system SALA (an NLP-based authoring system, organized into three distinct layers: functions, scripts and activities).

  • articleNo Access

    Use of NLP Tools in CALL System for Arabic

    This article focuses on the development of Natural Language Processing (NLP) tools for Computer Assisted Language Learning (CALL). First, we have developed some NLP tools: a labelled dictionary of Arabic (as complete as possible), a generator for morphological derivatives, a Conjugator and a morphological analyzer for Arabic. Second, we used these tools to create a number of educational applications for learning the Arabic language by using the proposed system SALA (an NLP-based authoring system, organized into three distinct layers: functions, scripts and activities).