Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    Recognition of Cursive Arabic Handwritten Text Using Embedded Training Based on Hidden Markov Models

    This paper presents a system for offline recognition of cursive Arabic handwritten text based on Hidden Markov Models (HMMs). The proposed work reports an effective method taking into account the context of character by applying an embedded training-based HMMs to perform and enhance the character models. The system is analytical without explicit segmentation; extracted features preceded by baseline estimation are statistical and structural to integrate both the peculiarities of the text and the pixel distribution characteristics of the word image. The experiments are done on benchmark IFN/ENIT database. The proposed work shows the effectiveness of using embedded training-based HMMs for enhancing the recognition rate, and the obtained results are promising and encouraging.

  • articleNo Access

    A NEW STRUCTURAL TECHNIQUE FOR RECOGNIZING PRINTED ARABIC TEXT

    This paper proposes a new structural technique for Arabic text recognition. The technique can be divided into five major steps: (1) preprocessing and binarization; (2) thinning; (3) binary tree construction; (4) segmentation; and (5) recognition. The advantage of this technique is that its execution does not depend on either the font or size of character. Thus, this same technique might be utilized for the recognition of machine or hand printed text.

    The relevant algorithm is implemented on a microcomputer. Experiments were conducted to verify the accuracy and the speed of this algorithm using about 20,000 subwords each with an average length of 3 characters. The subwords used were written using different fonts. The recognition rate obtained in the experiments indicated an accuracy of 93.38 % with a speed of 2.7 characters per second.

  • articleNo Access

    A New Method for Arabic Text Detection in Natural Scene Images

    Text in scene images can provide useful and vital information for content-based image analysis. Therefore, text detection and script identification in images are an important task. In this paper, we propose a new method for text detection in natural scene images, particularly for Arabic text, based on a bottom-up approach where four principal steps can be highlighted. The detection of extremely stable and homogeneous regions of interest (ROIs) is based on the Color Stability and Homogeneity Regions (CSHR) proposed technique. These regions are then labeled as textual or non-textual ROI. This identification is based on a structural approach. The textual ROIs are grouped to constitute zones according to spatial relations between them. Finally, the textual or non-textual nature of the constituted zones is refined. This last identification is based on handcrafted features and on features built from a Convolutional Neural Network (CNN) after learning. The proposed method was evaluated on the databases used for text detection in natural scene images: the competitions organized in 2017 edition of the International Conference on Document Analysis and Recognition (ICDAR2017), the Urdu-text database and our Natural Scene Image Database for Arabic Text detection (NSIDAT) database. The obtained experimental results seem to be interesting.

  • articleNo Access

    Improved Text Clustering Using k-Mean Bayesian Vectoriser

    In literature studies, high-dimensional data reduces the efficiency of clustering algorithms and maximises execution time. Therefore, in this paper, we propose an approach called a BV-kmeans (Bayesian Vectorisation along with k-means) that aims to improve document representation models for text clustering. This approach consists of integrating the k-means document clustering with the Bayesian Vectoriser that is used to compute the probability distribution of the documents in the vector space in order to overcome the problems of high-dimensional data and lower the consumption time. We have used various similarity measures which are namely: K divergence, Squared Euclidean distance and Squared χ2 distance in order to determine the effective metrics for modelling the similarity between documents with the proposed approach. We have evaluated the proposed approach on a set of common newspaper websites that have highly dimensional data. Experimental results show that the proposed approach can increase the degree to which a cluster encases documents from a specific category by 85%. This is in comparison with the standard k-means algorithm and it has succeeded in lowering the runtime using the proposed approach by 95% compared to the standard k-means algorithm.

  • articleNo Access

    A Deep Learning Artificial Neural Network Algorithm for Instance-based Arabic Language Authorship Attribution

    One of the common examples of cybercrime are identity theft and violating of intellectual property that commonly occur in social media. Authorship attribution (AA) techniques are used to extract and use several features of the text in order to identify the original author. These features are used to differentiate the writing style of one author from others. Several machine learning methods have been used to identify the AA using different languages. Few studies were conducted for Arabic AA. This paper aims to investigate the performance of deep learning-based artificial neural network (ANN) for identifying the attribution of authors using Arabic text. The applied model helps protect users in social media from identity theft and violating of their intellectual property. The experiments of this study used a dataset that includes 4,686 Arabic texts for 15 different authors. The performance of the deep learning method was compared with several machine learning methods. The experimental results showed the superior performance of deep learning for AA in Arabic language using different evaluation criteria such as F-score, accuracy, precision, and recall measures.