TRISEME DECISION TREES IN THE CONTINUOUS SPEECH RECOGNITION SYSTEM FOR TALKING HEAD ANIMATION
This paper was supported by the bilateral project between Flanders government of Belgium and the Ministry of Science and Technology of China. No.1999(0209).
Viseme is an audio-visual model for speech-driven talking head animation. In this paper, a viseme HMM based speech recogntion system is built to drive a talking head. Triseme is used to take mouth shape contextual information into account to achieve accurate models. As models mushroomed, to get robust models using the limited training data, decision tree based state tying is adopted in the triseme modeling. Similarity of mouth shapes (SMS) is brought forward to design visual question set in the tree building process. Experimental results show that SMS is a good measurement of mouth shape contexts. Decision tree is a feasible way to get robust model parameter estimations.