Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleNo Access

    PROSODY AND THE MUSIC OF THE HUMAN SPEECH

    We propose the use of a self-oscillating dynamical system — the pre-Galileian clock equation — for modeling the laryngeal tone. The parameters are shown to be the minimal control needed for generating the prosody of the human speech. Based on this model, we outline a peak delay detection algorithm for extracting the prosody of the real speech.

  • articleNo Access

    SVM-BASED PHONEME CLASSIFICATION AND LIP SHAPE REFINEMENT IN REAL-TIME LIP-SYNCH SYSTEM

    In this paper, we present a real time lip-synch system that activates 2-D avatar's lip motion in synch with incoming speech utterance. To achieve the real time operation of the system, the processing time was minimized by "merge and split" procedures resulting in coarse-to-fine phoneme classification. At each stage of phoneme classification, the support vector machine (SVM) method was applied to reduce the computational load while maintaining the desired accuracy. The coarse-to-fine phoneme classification, is accomplished via two_stages of feature extraction: in the first stage, each speech frame is acoustically analyzed for three classes of lip opening using Mel Frequency Cepstral Coefficients (MFCC) as a feature; in the second stage, each frame is further refined for detailed lip shape using formant information. The method was implemented in 2-D lip animation and it was demonstrated that the system was effective in accomplishing real-time lip-synch. This approach was tested on a PC using the Microsoft Visual Studio with an Intel Pentium IV 1.4 Giga Hz CPU and 384 MB RAM. It was observed that the methods of phoneme merging and SVM achieved about twice the speed in recognition than the method employing the Hidden Markov Model (HMM). A typical latency time per a single frame observed using the proposed method was in the order of 18.22 milliseconds while an HMM method under identical conditions resulted about 30.67 milliseconds.

  • articleNo Access

    ANALYSIS OF SOUND PATTERNS THROUGH WAVELET TRANSFORMS

    This paper starts with a brief discussion of so-called wavelet transforms, i.e., decompositions of arbitrary signals into localized contributions labelled by a scale parameter. The main features of the method are first illustrated through simple mathematical examples. Then we present the first applications of the method to the recognition and visualisation of characteristic features of speech and of musical sounds.

  • articleNo Access

    An Effective Depression Diagnostic System Using Speech Signal Analysis Through Deep Learning Methods

    According to the World Health Organization (WHO), depression is one of the largest contributors to the burden of mental and psychological diseases with more than 300 million people being affected; however a huge portion of this does not receive effective diagnosis. Traditional techniques to diagnose depression were based on clinical interviews. These techniques had several limitations based on duration and variety of symptoms, due to which these methods lacked subjectivity and accuracy. Speech is tested to be an important tool in diagnosis as they carry the impression of one’s thoughts and emotions. Speech signals not only carry the linguistic feature but they also contain several other features (paralinguistic features) which can reflect the emotional state of the speaker. The analysis of these features can be used for the diagnosis of depression. With the advancement of artificial techniques and algorithms, they have become popular and are widely used in tasks of pattern recognition and signal processing. These algorithms can easily extract the features from the data and learn to recognize patterns from them. Although these algorithms can successfully recognize emotions, their efficiency is often argued. The main objective of this paper is to propose a strategy to efficiently diagnose depression from the analysis of speech signals. The analysis is performed in the following two ways: First, by considering the male and female emotions combined (gender-neutral) where they are classified into two classes, and second, separately for the male and female emotions (gender-based) for a total of four classes. Experiments conducted show the advantages and shortcomings of paralinguistic features for diagnosis of depression. During experimentation we tested several architectures by efficiently tuning the hyperparameters. For K-nearest neighbors (KNN), best attained accuracy was 86%, whereas for Multi-Layer Perceptron (MLP) architecture the accuracy attained was 87.8%. Best results were obtained from hybrid 1D-Convolutional Neural Networks (CNN) and Long Short Term Memory (LSTM) architecture with the accuracy of 88.33% and 90.07% for gender-neutral and gender-based respectively.

  • articleNo Access

    EYE ON CHINA

      Infinitus Establishes Three Additional Scientific Research Platforms to Facilitate Its Product Innovation.

      BGI Partners with NRGene to Provide Broadest Genomic Analysis Available.

      CHINA AID - A Platform for Chinese Senior Care Industry.

      Modern Biotechnological Tools Contribute to Biodiversity Conservation.

      Older Adults Use Brain Regions Involved in Speaking to Compensate for Bad Hearing.

      Chinese Scientists Discover Molecules to Repair Organs.

      CAS Sits Atop of Nature Index 2016 Rising Star List.

      Chinese Researchers to Develop 3D Skin Printing Technology.

      Varian Chosen to Equip First Government Owned Proton Centre in China.

      HONG KONG NEWS – Amgen Launches Commercial Office in Hong Kong.

    • articleNo Access

      FLUCTUATIONS IN SPEECH

      We argue that when individuals enunciate sounds which are perceived to be the same, the sounds have the commonalty that their spectra can be transformed into a new domain which results in identical spectra except for a speaker dependent translation factor. We call the transformation function the speech scale. The speech scale is experimentally obtained. In this paper we explore the mathematical issues involved and obtain various criteria for when a transformation to a new domain results in a speaker independent transform.

    • articleOpen Access

      Speech Envelope Dynamics for Noise-Robust Auditory Scene Analysis in Robotics

      Humans make extensive use of auditory cues to interact with other humans, especially in challenging real-world acoustic environments. Multiple distinct acoustic events usually mix together in a complex auditory scene. The ability to separate and localize mixed sound in complex auditory scenes remains a demanding skill for binaural robots. In fact, binaural robots are required to disambiguate and interpret the environmental scene with only two sensors. At the same time, robots that interact with humans should be able to gain insights about the speakers in the environment, such as how many speakers are present and where they are located. For this reason, the speech signal is distinctly important among auditory stimuli commonly found in human-centered acoustic environments. In this paper, we propose a Bayesian method of selectively processing acoustic data that exploits the characteristic amplitude envelope dynamics of human speech to infer the location of speakers in the complex auditory scene. The goal was to demonstrate the effectiveness of this speech-specific temporal dynamics approach. Further, we measure how effective this method is in comparison with more traditional methods based on amplitude detection only.

    • articleNo Access

      MULTIMODAL COMMUNICATION FROM MULTIMODAL THINKING — TOWARDS AN INTEGRATED MODEL OF SPEECH AND GESTURE PRODUCTION

      A computational model for the automatic production of combined speech and iconic gesture is presented. The generation of multimodal behavior is grounded in processes of multimodal thinking, in which a propositional representation interacts and interfaces with an imagistic representation of visuo-spatial imagery. An integrated architecture for this is described, in which the planning of content and the planning of form across both modalities proceed in an interactive manner. Results from an empirical study are reported that inform the on-the-spot formation of gestures.

    • articleNo Access

      Evaluating the MSA West Point Speech Corpus

      Compared to other major languages of the world, the Arabic language suffers from a dearth of research initiatives and research resources. As a result, Modern Standard Arabic (MSA) lacks reliable speech corpora for research in phonetics and related areas of linguistics. In recent years the Linguistic Data Consortium (LDC) published the first public MSA speech corpus designed for speech recognition experiments. That corpus was called West Point. Currently, we are using this corpus in our research experiments for speech recognition and other speech processing investigations. The aim of this paper is to evaluate the West Point Corpus from the MSA phonetic and linguistic point of view. The phonemes used and their numbers, the phoneme definitions, the labeling, and the scripts established by the West Point Corpus are included in the evaluation. Weaknesses, strengths, and discrepancies of the West Point Corpus regarding the linguistic rules and phonetic characteristics of MSA are also discussed in this paper.

    • chapterOpen Access

      A Conversational Agent for Early Detection of Neurotoxic Effects of Medications through Automated Intensive Observation

      We present a fully automated AI-based system for intensive monitoring of cognitive symptoms of neurotoxicity that frequently appear as a result of immunotherapy of hematologic malignancies. Early manifestations of these symptoms are evident in the patient’s speech in the form of mild aphasia and confusion and can be detected and effectively treated prior to onset of more serious and potentially life-threatening impairment. We have developed the Automated Neural Nursing Assistant (ANNA) system designed to conduct a brief cognitive assessment several times per day over the telephone for 5-14 days following infusion of the immunotherapy medication. ANNA uses a conversational agent based on a large language model to elicit spontaneous speech in a semi-structured dialogue, followed by a series of brief language-based neurocognitive tests. In this paper we share ANNA’s design and implementation, results of a pilot functional evaluation study, and discuss technical and logistic challenges facing the introduction of this type of technology in clinical practice. A large-scale clinical evaluation of ANNA will be conducted in an observational study of patients undergoing immunotherapy at the University of Minnesota Masonic Cancer Center starting in the Fall 2023.

    • chapterNo Access

      Speech Emotion Recognition Based on Time Domain Feature

      In resent years, speech emotion recognition has attracted more and more attention. In this paper, we extracted a new emotion feature named as the Long-term Rise Zero-Crossing Interval (LRZCI). The support vector machine (SVM) is used as classifier. Recognition experiments are conducted on the Danish Emotion Speech (DES) Database. Experimental results illustrate the validity of the kind of feature.

    • chapterNo Access

      Interrelationships, Communication, Semiotics, and Artificial Consciousness

      The aim of this chapter is to refine some questions regarding AI, and to provide partial answers to them. We analyze the state of the art in designing intelligent systems that are able to mimic human complex activities, including acts based on artificial consciousness. The analysis is performed to contrast the human cognition and behavior to the similar processes in AI systems. The analysis includes elements of psychology, sociology, and communication science related to humans and lower level beings. The second part of this chapter is devoted to human-human and man-machine communication, as related to intelligence. We emphasize that the relational aspects constitute the basis for the perception, knowledge, semiotic and communication processes. Several consequences are derived. Subsequently, we deal with the tools needed to endow the machines with intelligence. We discuss the roles of knowledge and data structures. The results could help building "sensitive and intelligent" machines.