Exploitation of Morphological Structures in Large Vocabulary Arabic Speech Recognition
Abstract
This paper presents a new approach for large vocabulary Arabic speech recognition based on exploiting the morphological structures of the Arabic language. In this model, word discrimination is achieved by a hybrid analysis scheme, where vowels are described in detail while consonants are classified according to broad phonetic classes. Different phonetic classification strategies are used to describe two large vocabulary lexicons. The results show that about 83% of the 10,000 test Arabic words can be uniquely represented by using 7 broad phonetic classes for consonants and six classes for vowels. In this case, the maximum number of words having the same phonetic labelling is 6. This paper summarises the results of ten different phonetic classification schemes and discusses their implication for a large vocabulary speech recognition system.