TAIWANESE MIN-NAN SPEECH RECOGNITION AND SYNTHESIS
In this chapter, we review research efforts in automatic speech recognition (ASR), text-to-speech (TTS) and speech corpus design for Taiwanese, or Min-nan – a major native language spoken in Taiwan. Following an introduction of the orthography and phonetic structure of Taiwanese, we describe the various databases used for these tasks, including the Formosa Lexicon (ForLex) – a phonetically transcribed database using Formosa Alphabet (ForPA), an alphabet system designed with Taiwan's multi-lingual applications in mind – and the Formosa Speech Database (ForSDat) – a speech corpus made up of microphone and telephone speech. For ASR, we propose a unified scheme that includes Mandarin/Taiwanese bilingual acoustic models, incorporate variations in pronunciation into pronunciation modeling, and create a character-based tree-structured searching network. This scheme is especially suitable for handling multiple character-based languages, such as members of the CJKV (Chinese, Japanese, Korean, and Vietnam) family. For speech synthesis, through the use of the bilingual lexicon information, the Taiwanese TTS system is made up of three functional modules: a text analysis module, a prosody module, and a waveform synthesis module. An experiment conducted to evaluate the text analysis and tone sandhi modules reveals about 90% labeling and 65% tone sandhi accuracies. Multiple-level unit selection for a limited domain application of TTS is also proposed to improve the naturalness of synthesized speech.