Processing math: 100%
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    RPSOVF Prediction Model for Speech Signal Series Based on UPSO

    In this paper, we propose a nonlinear prediction model of speech signal series with an explicit structure. In order to overcome some intrinsic shortcomings, such as traps at the local minimum, improper selection of parameters, and slow convergence rate, which are always caused by improper parameters generated by, typically, the low performance of least mean square (LMS) in updating kernel coefficients of the Volterra model, a uniform searching particle swarm optimization (UPSO) algorithm to optimize the kernel coefficients of the Volterra model is proposed. The second-order Volterra filter (SOVF) speech prediction model based on UPSO is established by using English phonemes, words, and phrases. In order to reduce the complexity of the model, given a user-designed tolerance of errors, we extract the reduced parameter of SOVF (RPSOVF) for acceleration. The experimental results show that in the tasks of single-frame and multiframe speech signals, both UPSO-SOVF and UPSO-RPSOVF are better than LMS-SOVF and PSO-SOVF in terms of root mean square error (RMSE) and mean absolute deviation (MAD). UPSO-SOVF and UPSO-RPSOVF can better reflect trends and regularity of speech signals, which can fully meet the requirements of speech signal prediction. The proposed model presents a nonlinear analysis and valuable model structure for speech signal series, and can be further employed in speech signal reconstruction or compression coding.

  • articleNo Access

    Influence of Acoustics in Speech Recognition for Oriental Languages

    As agileTV, Nuance XML Platforms, Microsoft Speech server2004 are the projects in the news, there is still a demand for a better speech recognition engine in terms of WER. This article puts forward the acoustic signal quality requirement for efficient speech recognition. It reports that the major thrust is on the acoustics of speech recognition. It also envisages the performance of various speech recognition engines in the industry, the techniques adopted by them towards achieving the quality acoustic signal of the speaker for efficient results [in terms of Less Word Rate Error] and the impact of the external factors that makes it less robust in terms of attaining high quality speech signal. To tackle the problem we suggest an intelligent mechanism of modification in the regular Input speech signal format for significant improvement in speech recognition.

  • articleNo Access

    RDN-NET: A Deep Learning Framework for Asthma Prediction and Classification Using Recurrent Deep Neural Network

    Asthma is the one of the crucial types of disease, which causes the huge deaths of all age groups around the world. So, early detection and prevention of asthma disease can save numerous lives and are also helpful to the medical field. But the conventional machine learning methods have failed to detect the asthma from the speech signals and resulted in low accuracy. Thus, this paper presented the advanced deep learning-based asthma prediction and classification using recurrent deep neural network (RDN-Net). Initially, speech signals are preprocessed by using minimum mean-square-error short-time spectral amplitude (MMSE-STSA) method, which is used to remove the noises and enhances the speech properties. Then, improved Ripplet-II Transform (IR2T) is used to extract disease-dependent and disease-specific features. Then, modified gray wolf optimization (MGWO)-based bio-optimization approach is used to select the optimal features by hunting process. Finally, RDN-Net is used to predict the asthma disease present from speech signal and classifies the type as either wheeze, crackle or normal. The simulations are carried out on real-time COSWARA dataset and the proposed method resulted in better performance for all metrics as compared to the state-of-the-art approaches.

  • articleNo Access

    ANALYSIS OF SIGNAL TREND BY A PSEUDO-EMD METHOD WITH APPLICATIONS TO WEATHER AND SPEECH DATA

    Empirical mode decomposition (EMD) has been established as a valuable tool in determining nonlinear signal trend. EMD decomposes a one-dimensional (1D) signal into hierarchical components known as intrinsic mode functions (IMFs) and a residue, based on the local properties of the signal. The first IMF depicts the highest local oscillations, while the residue depicts the trend of a signal/data. In each iteration of the EMD process, interpolation is applied to some local maxima and minima points to form upper and lower envelopes, respectively. But, the application of interpolation methods causes huge computation time and other artifacts in the decomposition, which limits the use of EMD for many real life signals. This paper proposes an effective method that replaces the interpolation step by direct envelope estimation using order statistics filters, which results in decreased computation time, following a similar EMD approach that has been recently proposed for two-dimensional data or image analysis. The modified EMD of this paper called pseudo EMD (P-EMD) method is particularly useful in determining, analyzing, and/or modifying the trend of various signals to obtain and/or produce some desired results/outcomes. Several synthetic and real-life signals such as speech signal and sea level pressure and temperature are tested to verify the effectiveness of the P-EMD. From the results, P-EMD has been found as a superior alternative for trend analysis of signal/data, since it results in more accurate trend compared to the other interpolation based EMD methods such as classical EMD (CEMD) and a modified EMD (MEMD), and also facilitates faster computation.

  • articleNo Access

    PERFORMANCE COMPARISON OF REGRESSION TECHNIQUES IN PREDICTING PARKINSON DISEASE SEVERITY SCORE USING SPEECH FEATURES

    This paper gives a performance comparison in terms of Root mean square error (RMSE) of the six regression techniques used to predict the Parkinson disease severity score. People affected by Parkinson disease suffer various muscular impairments like gait, speech etc. The severity of the disease is generally assessed by the clinicians by observing the different muscular functions of the affected people or by performing scans of the brain. This paper focusses on predicting the disease severity using features of speech signal and performing regression on these features. The features used in the prediction are the phonation features extracted from voice samples of both Parkinson disease affected people and healthy people. The 14 phonation features extracted include the frequency variability features jitter and its other variants, the energy variability features shimmer and its other variants, the mean auto correlation of the pitch frequencies, harmonicity features harmonic to noise ratio and noise to harmonic ratio. The six regression techniques used to predict the severity score are the Linear, Stepwise, Lasso, Ridge regression, prediction using Neural network model and Classification and Regression trees (CART). The trained regression model is validated using the k-fold cross-validation method with k values three, five, seven and ten and also using the hold out validation model in which the hold out value is taken to be 0.3. The results obtained from the six regression techniques is then compared and it shows that the severity score prediction using Neural network model provides the least RMSE of 1.5 followed by 1.8 using the CART regression technique.