Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    Influence of Acoustics in Speech Recognition for Oriental Languages

    As agileTV, Nuance XML Platforms, Microsoft Speech server2004 are the projects in the news, there is still a demand for a better speech recognition engine in terms of WER. This article puts forward the acoustic signal quality requirement for efficient speech recognition. It reports that the major thrust is on the acoustics of speech recognition. It also envisages the performance of various speech recognition engines in the industry, the techniques adopted by them towards achieving the quality acoustic signal of the speaker for efficient results [in terms of Less Word Rate Error] and the impact of the external factors that makes it less robust in terms of attaining high quality speech signal. To tackle the problem we suggest an intelligent mechanism of modification in the regular Input speech signal format for significant improvement in speech recognition.

  • articleNo Access

    VISUAL SPEECH RECOGNITION USING OPTICAL FLOW AND SUPPORT VECTOR MACHINES

    A lip-reading technique that identifies visemes from visual data only and without evaluating the corresponding acoustic signals is presented. The technique is based on vertical components of the optical flow (OF) analysis and these are classified using support vector machines (SVM). The OF is decomposed into multiple non-overlapping fixed scale blocks and statistical features of each block are computed for successive video frames of an utterance. This technique performs automatic temporal segmentation (i.e., determining the start and the end of an utterance) of the utterances, achieved by pair-wise pixel comparison method, which evaluates the differences in intensity of corresponding pixels in two successive frames. The experiments were conducted on a database of 14 visemes taken from seven subjects and the accuracy tested using five and ten fold cross validation for binary and multiclass SVM respectively to determine the impact of subject variations. Unlike other systems in the literature, the results indicate that the proposed method is more robust to inter-subject variations with high sensitivity and specificity for 12 out of 14 visemes. Potential applications of such a system include human computer interface (HCI) for mobility-impaired users, lip reading mobile phones, in-vehicle systems, and improvement of speech based computer control in noisy environment.