Please login to be able to save your searches and receive alerts for new content matching your search criteria.
In this paper, we propose a nonlinear prediction model of speech signal series with an explicit structure. In order to overcome some intrinsic shortcomings, such as traps at the local minimum, improper selection of parameters, and slow convergence rate, which are always caused by improper parameters generated by, typically, the low performance of least mean square (LMS) in updating kernel coefficients of the Volterra model, a uniform searching particle swarm optimization (UPSO) algorithm to optimize the kernel coefficients of the Volterra model is proposed. The second-order Volterra filter (SOVF) speech prediction model based on UPSO is established by using English phonemes, words, and phrases. In order to reduce the complexity of the model, given a user-designed tolerance of errors, we extract the reduced parameter of SOVF (RPSOVF) for acceleration. The experimental results show that in the tasks of single-frame and multiframe speech signals, both UPSO-SOVF and UPSO-RPSOVF are better than LMS-SOVF and PSO-SOVF in terms of root mean square error (RMSE) and mean absolute deviation (MAD). UPSO-SOVF and UPSO-RPSOVF can better reflect trends and regularity of speech signals, which can fully meet the requirements of speech signal prediction. The proposed model presents a nonlinear analysis and valuable model structure for speech signal series, and can be further employed in speech signal reconstruction or compression coding.
As agileTV, Nuance XML Platforms, Microsoft Speech server2004 are the projects in the news, there is still a demand for a better speech recognition engine in terms of WER. This article puts forward the acoustic signal quality requirement for efficient speech recognition. It reports that the major thrust is on the acoustics of speech recognition. It also envisages the performance of various speech recognition engines in the industry, the techniques adopted by them towards achieving the quality acoustic signal of the speaker for efficient results [in terms of Less Word Rate Error] and the impact of the external factors that makes it less robust in terms of attaining high quality speech signal. To tackle the problem we suggest an intelligent mechanism of modification in the regular Input speech signal format for significant improvement in speech recognition.
Asthma is the one of the crucial types of disease, which causes the huge deaths of all age groups around the world. So, early detection and prevention of asthma disease can save numerous lives and are also helpful to the medical field. But the conventional machine learning methods have failed to detect the asthma from the speech signals and resulted in low accuracy. Thus, this paper presented the advanced deep learning-based asthma prediction and classification using recurrent deep neural network (RDN-Net). Initially, speech signals are preprocessed by using minimum mean-square-error short-time spectral amplitude (MMSE-STSA) method, which is used to remove the noises and enhances the speech properties. Then, improved Ripplet-II Transform (IR2T) is used to extract disease-dependent and disease-specific features. Then, modified gray wolf optimization (MGWO)-based bio-optimization approach is used to select the optimal features by hunting process. Finally, RDN-Net is used to predict the asthma disease present from speech signal and classifies the type as either wheeze, crackle or normal. The simulations are carried out on real-time COSWARA dataset and the proposed method resulted in better performance for all metrics as compared to the state-of-the-art approaches.
Empirical mode decomposition (EMD) has been established as a valuable tool in determining nonlinear signal trend. EMD decomposes a one-dimensional (1D) signal into hierarchical components known as intrinsic mode functions (IMFs) and a residue, based on the local properties of the signal. The first IMF depicts the highest local oscillations, while the residue depicts the trend of a signal/data. In each iteration of the EMD process, interpolation is applied to some local maxima and minima points to form upper and lower envelopes, respectively. But, the application of interpolation methods causes huge computation time and other artifacts in the decomposition, which limits the use of EMD for many real life signals. This paper proposes an effective method that replaces the interpolation step by direct envelope estimation using order statistics filters, which results in decreased computation time, following a similar EMD approach that has been recently proposed for two-dimensional data or image analysis. The modified EMD of this paper called pseudo EMD (P-EMD) method is particularly useful in determining, analyzing, and/or modifying the trend of various signals to obtain and/or produce some desired results/outcomes. Several synthetic and real-life signals such as speech signal and sea level pressure and temperature are tested to verify the effectiveness of the P-EMD. From the results, P-EMD has been found as a superior alternative for trend analysis of signal/data, since it results in more accurate trend compared to the other interpolation based EMD methods such as classical EMD (CEMD) and a modified EMD (MEMD), and also facilitates faster computation.
This paper gives a performance comparison in terms of Root mean square error (RMSE) of the six regression techniques used to predict the Parkinson disease severity score. People affected by Parkinson disease suffer various muscular impairments like gait, speech etc. The severity of the disease is generally assessed by the clinicians by observing the different muscular functions of the affected people or by performing scans of the brain. This paper focusses on predicting the disease severity using features of speech signal and performing regression on these features. The features used in the prediction are the phonation features extracted from voice samples of both Parkinson disease affected people and healthy people. The 14 phonation features extracted include the frequency variability features jitter and its other variants, the energy variability features shimmer and its other variants, the mean auto correlation of the pitch frequencies, harmonicity features harmonic to noise ratio and noise to harmonic ratio. The six regression techniques used to predict the severity score are the Linear, Stepwise, Lasso, Ridge regression, prediction using Neural network model and Classification and Regression trees (CART). The trained regression model is validated using the k-fold cross-validation method with k values three, five, seven and ten and also using the hold out validation model in which the hold out value is taken to be 0.3. The results obtained from the six regression techniques is then compared and it shows that the severity score prediction using Neural network model provides the least RMSE of 1.5 followed by 1.8 using the CART regression technique.