World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

Building Statistical Language Models for Persian Continuous Speech Recognition Systems Using the Peykare Corpus

    https://doi.org/10.1142/S1793840611002188Cited by:4 (Source: Crossref)

    In this paper, we build statistical language models for the Persian language using a Persian corpus called Peykare. Then, we incorporate the constructed language models in a Persian continuous speech recognition (CSR) system. First, we unify the different orthographies of words to make the texts of the corpus consistent. In addition, we decrease the number of POS tags used in the corpus by manual clustering. Then, the word-based and the class-based n-gram language models are built using the unified and reduced-tag-set corpus. For building the class-based language models, several methods are used including a new method called LGM-based word clustering. We present the procedure of incorporating language models in a Persian CSR system. Using these language models absolute reductions of up to 13.2% in word error rate were achieved.