No Access

Building Statistical Language Models for Persian Continuous Speech Recognition Systems Using the Peykare Corpus

MOHAMMAD BAHRANI

Speech Processing Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

Search for more papers by this author

and

HOSSEIN SAMETI

Speech Processing Lab, Department of Computer Engineering, Sharif University of Technology, Tehran, Iran

Search for more papers by this author

https://doi.org/10.1142/S1793840611002188Cited by:4 (Source: Crossref)

Abstract

In this paper, we build statistical language models for the Persian language using a Persian corpus called Peykare. Then, we incorporate the constructed language models in a Persian continuous speech recognition (CSR) system. First, we unify the different orthographies of words to make the texts of the corpus consistent. In addition, we decrease the number of POS tags used in the corpus by manual clustering. Then, the word-based and the class-based n-gram language models are built using the unified and reduced-tag-set corpus. For building the class-based language models, several methods are used including a new method called LGM-based word clustering. We present the procedure of incorporating language models in a Persian CSR system. Using these language models absolute reductions of up to 13.2% in word error rate were achieved.

Keywords:

References

S. M. Ahadi, Recognition of Continuous Persian Speech Using a Medium-sized Vocabulary Speech Corpus, Proc. Eurospeech99 pp. 863–866. Google Scholar
A. Ashraf-Sadeghi and Z. Zandi-Moghaddam , Farhang-e Emlâei-ye Khatt-e Fârsi [A Dictionary of Persian Orthography & Spelling] ( Academy of Persian Language and Literature , Tehran , 2006 ) . Google Scholar
L. R. Bahl, F. Jelinek and R. L. Mercer, IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI 5(2), 179 (1983). Google Scholar
M. Bahraniet al., LNAI 5027 (Springer, 2008) pp. 286–293. Google Scholar
M. Bahraniet al., Building and Incorporating Language Models for Persian Continuous Speech Recognition Systems, Proc. of the 5th International Conference on Language Resources and Evaluation pp. 2590–2593. Google Scholar
Bazargani, N., Daste-bandi-ye Kalamât barâye Estefâde dar Model-e Zabâni-ye Fârsi [Farsi Class-based Language Modeling], M.S. thesis, Biomedical Engineering Dept., AmirKabir University of Technology, Iran, 2004 (in Persian) . Google Scholar
M. Bijankhan, J. Seikhzadeghan and M. Bahrani, Language Resources and Evaluation Journal (2009). Google Scholar
M. Bijankhanet al., FARSDAT — The Speech Database of Farsi Spoken Language, Proc. of the 5th Australian International Conference on Speech Science and Technology pp. 826–831. Google Scholar
P. Brownet al., Computational Linguistics 18(4), 467 (1992). Google Scholar
S. F. Chen and J. Goodman, An Empirical Study of Smoothing Techniques for Language Modeling, Proc. of 34th Annual Meeting on Association for Computational Linguistics pp. 310–318. Google Scholar
P. Clarkson and R. Rosenfeld, Statistical Language Modeling Using the CMU-Cambridge Toolkit, Proc. Eurospeech975 (1997) pp. 2707–2710. Google Scholar
B. Fritzke , Some Competitive Learning Methods ( System Biophysics Institute for Neural Computation, Ruhr-Universität , Bochum , 1997 ) , ftp://ftp.neuroinformatik.ruhr-unibochum.de/pub/software/NN/DemoGNG/sclm.ps.gz . Google Scholar
Hajihosseini, A., Estefâde az Ettelâ'ât-e Zabâni dar Bâzshenâsi-ye Goftâr-e Peyvaste-ye Fârsi [Using Lingual Information in Persian Continuous Speech Recognition], M.S. thesis, Biomedical Engineering Dept., AmirKabir University of Technology, Iran, 2005 (in Persian) . Google Scholar
M. P. Harperet al., Integrating Language Models with Speech Recognition, AAAI-94 Workshop on the Integration of Natural Language and Speech Processing pp. 139–146. Google Scholar
X. Huanget al., Computer Speech and Langauge 2, 137 (1993). Google Scholar
F. Jelinek, Readings in Speech Recognition (Morgan Kaufmann, San Francisco, CA, 1990) pp. 450–506. Crossref, Google Scholar
Jurafsky, D., Martin J. H., Speech and Language Processing, 2nd Edition, Prentice-Hall Series in Artificial Intelligence, 2008 . Google Scholar
S. Katz, IEEE Transactions on Acoustics, Speech and Signal Processing 35, 400 (1987), DOI: 10.1109/TASSP.1987.1165125. Crossref, Google Scholar
S. Martin, J. Liermann and H. Ney, Speech Communication 24, 19 (1998), DOI: 10.1016/S0167-6393(97)00062-9. Crossref, Google Scholar
K. Megerdoomian, Finite-State Morphological Analysis of Persian, Proceedings of 20th International Conference on Computational Linguistics pp. 35–41. Google Scholar
T. Niesler and P. Woodland, A variable-length category-based n-gram language model, Proceedings of the International Conference on Acoustic, Speech and Signal Processing (ICASSP) pp. 164–167. Google Scholar
S. Ortmanns, A. Eiden and H. Ney, Improved Lexical Tree Search for Large Vocabulary Speech Recognition, Proc. IEEE International Conference on Acoustics, Speech and Signal Processing pp. 817–820. Google Scholar
H. Sametiet al., Communications in Computer and Information Science, Advances in Computer Science and Engineering 6 (Springer, Berlin–Heidelberg, 2008) pp. 485–492. Google Scholar
B. Srinivas, Almost Parsing Techniques for Language Modeling, Proceedings of the 4th International Conference on Spoken Language Processing pp. 1169–1172. Google Scholar
S. J. Young et al. , The HTK Hidden Markov Model Toolkit Book ( Cambridge University , 1995 ) . Google Scholar