World Scientific
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

TAIWANESE MIN-NAN SPEECH RECOGNITION AND SYNTHESIS

    https://doi.org/10.1142/9789812772961_0017Cited by:0 (Source: Crossref)
    Abstract:

    In this chapter, we review research efforts in automatic speech recognition (ASR), text-to-speech (TTS) and speech corpus design for Taiwanese, or Min-nan – a major native language spoken in Taiwan. Following an introduction of the orthography and phonetic structure of Taiwanese, we describe the various databases used for these tasks, including the Formosa Lexicon (ForLex) – a phonetically transcribed database using Formosa Alphabet (ForPA), an alphabet system designed with Taiwan's multi-lingual applications in mind – and the Formosa Speech Database (ForSDat) – a speech corpus made up of microphone and telephone speech. For ASR, we propose a unified scheme that includes Mandarin/Taiwanese bilingual acoustic models, incorporate variations in pronunciation into pronunciation modeling, and create a character-based tree-structured searching network. This scheme is especially suitable for handling multiple character-based languages, such as members of the CJKV (Chinese, Japanese, Korean, and Vietnam) family. For speech synthesis, through the use of the bilingual lexicon information, the Taiwanese TTS system is made up of three functional modules: a text analysis module, a prosody module, and a waveform synthesis module. An experiment conducted to evaluate the text analysis and tone sandhi modules reveals about 90% labeling and 65% tone sandhi accuracies. Multiple-level unit selection for a limited domain application of TTS is also proposed to improve the naturalness of synthesized speech.