An Adaptive Parser for Arabic Language Processing
Abstract
In this study, we present a robust bottom-up Arabic parser that investigates the correctness of Arabic sentences by passing them through a set of predetermined states relying on their individual words. The major benefit of our approach is the reduced number of backup states tested when determining the grammatical structure of a given sentence. The proposed approach is optimized to tokenize the input sentences correctly since accurate tokenization is the essential step of the parser; this process also reduces the parsing time. Our proposed parser is extendable; hence, it allows new words to be added to the lexicon, i.e. the lexicon is built dynamically. Experimental results have demonstrated the effectiveness of our approach in checking correctly numerable sentences with different lengths. The accuracy was 85.88% when tested on a sample of 170 Arabic sentences taken from an existing Arabic text taught in k-12 grade levels.