No Access

Thai Spelling Recognition Using a Continuous Speech Corpus

CHUTIMA PISARN

Sirindhorn International Institute of Technology, 131 Moo 5 Tiwanont Rd., Bangkadi, Muang, Pathumthani 12000, Thailand

Search for more papers by this author

THANARUK THEERAMUNKONG

Sirindhorn International Institute of Technology, 131 Moo 5 Tiwanont Rd., Bangkadi, Muang, Pathumthani 12000, Thailand

Search for more papers by this author

NICK CERCONE

Faculty of Computer Science, Dalhousie University, 6050 University Avenue, Halifax, NS, Canada

Search for more papers by this author

, and

JUNALUX CHALIDABHONGSE

Sirindhorn International Institute of Technology, 131 Moo 5 Tiwanont Rd., Bangkadi, Muang, Pathumthani 12000, Thailand

Search for more papers by this author

https://doi.org/10.1142/S021942790500133XCited by:2 (Source: Crossref)

Abstract

Spelling recognition provides alternative input method for computer systems as well as enhances a speech recognizer to cope with incorrectly recognized words and out-of-vocabulary words. This paper presents a general framework of Thai speech recognition enhanced with spelling recognition. Towards the implementation of Thai spelling recognition, Thai alphabets and their spelling methods are analyzed. A method based on hidden Markov models is proposed for constructing a Thai spelling recognition system from an existing continuous speech corpus. To compensate speed difference between spelling utterances and continuous speech utterances, the adjustment of utterance speed is taken into account. Two alternative language models, bigram and trigram, are used to investigate the performance of spelling recognition under three different environments: close-type, open-type and mix-type language models. Using the 1.25-times-stretched training utterances under the mix-type language model, the system achieves 87.37% correctness and 87.18% accuracy for bigram, and up to 91.12% correctness and 90.80% accuracy for trigram.

Paper presented at the Int. Conf. on Intelligence in Communication Systems (IntellComm 2004), Bangkok, Thailand, 23–26 Nov 2004.

Keywords:

References

R. San-Segundo et al. , Detection of recognition errors and out of the spelling dictionary names in a spelled name recognizer for Spanish , Proceedings of EUROSPEECH 2001 ( 2001 ) . Google Scholar
R. San-Segundoet al., Journal of Speech Communication 38, 287 (2002). Crossref, Google Scholar
F. Rodrigues , R. Rodrigues and C. Martins , An isolated letter recognizer for proper name identification over the telephone , Proceedings of 9th Portuguese Conference on Pattern Recognition ( 1997 ) . Google Scholar
C. D. Mitchell and A. R. Setlur, Improved spelling recognition using a tree-based fast lexical match, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing2 (1999) pp. 597–600. Google Scholar
J. G. Bauer and J. Junkawitsch, Accurate recognition of city names with spelling as a fallback strategy, Proceedings of EUROSPEECH 1999 (1999) pp. 263–266. Google Scholar
C. Pisarn and T. Theeramunkong, Incorporating tone information to improve Thai continuous speech recognition, Proceedings of International Conference on Intelligent Technologies (2003) pp. 84–89. Google Scholar
Tellvoice Technology, http://www.tellvoice.com . Google Scholar
C. Pisarn and T. Theeramunkong, Thai Continuous Speech: The Technical Report, SIIT, 2004 . Google Scholar
S. Kasuriya et al. , NECTEC-ATR Thai speech corpus , Proceedings of The Oriental COCOSDA 2003 . Google Scholar
W. Verhelst and M. Roelands, An overlap-add technique based on waveform similarity (wsola) for high quality time-scale modification of speech, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing2 (1993) pp. 554–557. Google Scholar
G. Pallone, Time-stretching and pitch-shifting of audio signals: Application to cinema/video conversion, http://www.iua.upf.es/activitats/semirec/semi-pallone/index.htm . Google Scholar
Wikipedia: The Free Encyclopedia, Audio time stretching, http://www.ebroadcast.com.au/lookup/encyclopedia/au/Audio_time_stretching.html . Google Scholar
S. Young et al. , The HTK Book ( Cambridge University Engineering Department , 2002 ) . Google Scholar
P. Taylor, R. Caley, A. W. Black and S. King, Edinburgh Speech Tools Library System Documentation (Edition 1.2) for 1.2.0, (1999) http://festvox.org/docs/speech_tools-1.2.0/x2921.htm . Google Scholar