Advanced Search

Narrow Results

Results: 1 - 20of22

Follow results:

refine search

Filters

per page:

Sort: Relevance

Context for search term 1Search term 1*

All Dates

LastSelect static range

Custom Range

Select starting monthSelect starting year

Select ending monthSelect ending year

Advanced

Search name	Searched On	Run search
Keyword: Speech Enhancement (22)	25 Mar 2025	Run
Keyword: Universal Coverage (2)	25 Mar 2025	Run
Keyword: Chiral Transitions (1)	25 Mar 2025	Run
Keyword: Boundary Impedance (2)	25 Mar 2025	Run
Keyword: Histogram Shifting (1)	25 Mar 2025	Run

articleNo Access
Speech Enhancement: A Review of Different Deep Learning Methods
- Sivaramakrishna Yechuri and
- Sunny Dayal Vanabathina
International Journal of Image and Graphics05 Sep 2023
Preview Abstract
Speech enhancement methods differ depending on the degree of degradation and noise in the speech signal, so research in the field is still difficult, especially when dealing with residual and background noise, which is highly transient. Numerous deep learning networks have been developed that provide promising results for improving the perceptual quality and intelligibility of noisy speech. Innovation and research in speech enhancement have been opened up by the power of deep learning techniques with implications across a wide range of real time applications. By reviewing the important datasets, feature extraction methods, deep learning models, training algorithms and evaluation metrics for speech enhancement, this paper provides a comprehensive overview. We begin by tracing the evolution of speech enhancement research, from early approaches to recent advances in deep learning architectures. By analyzing and comparing the approaches to solving speech enhancement challenges, we categorize them according to their strengths and weaknesses. Moreover, we discuss the challenges and future directions of deep learning in speech enhancement, including the demand for parameter-efficient models for speech enhancement. The purpose of this paper is to examine the development of the field, compare and contrast different approaches, and highlight future directions as well as challenges for further research.
articleNo Access
Deep neural network and noise classification-based speech enhancement
- Wenhua Shi,
- Xiongwei Zhang,
- Xia Zou, and
- Wei Han
Modern Physics Letters B27 Jul 2017
Preview Abstract
In this paper, a speech enhancement method using noise classification and Deep Neural Network (DNN) was proposed. Gaussian mixture model (GMM) was employed to determine the noise type in speech-absent frames. DNN was used to model the relationship between noisy observation and clean speech. Once the noise type was determined, the corresponding DNN model was applied to enhance the noisy speech. GMM was trained with mel-frequency cepstrum coefficients (MFCC) and the parameters were estimated with an iterative expectation-maximization (EM) algorithm. Noise type was updated by spectrum entropy-based voice activity detection (VAD). Experimental results demonstrate that the proposed method could achieve better objective speech quality and smaller distortion under stationary and non-stationary conditions.
articleNo Access
Regularized sparse decomposition model for speech enhancement via convex distortion measure
- Nasir Saleem and
- Muhammad Irfan Khattak
Modern Physics Letters B07 Aug 2018
Preview Abstract
An important stage in speech enhancement is to estimate noise signal which is a difficult task in non-stationary and low signal-to-noise conditions. This paper presents an iterative speech enhancement approach which requires no prior knowledge of noise and is based on low-rank sparse matrix decomposition using Gammatone filterbank and convex distortion measure. To estimate noise and speech, the noisy speech is decomposed into low-rank noise and sparse-speech parts by enforcing sparsity regularization. The exact distribution of noise signals and noise estimator is not required in this approach. The experimental results demonstrate that our approach outperforms competing methods and yields better overall speech quality and intelligibility. Moreover, composite objective measure reinforced a better performance in terms of residual noise and speech distortion in adverse noisy conditions. The time-varying spectral analysis validates significant reduction of the background noise.
articleNo Access
Speech enhancement based on noise classification and deep neural network
- Wenbo Wang,
- Houguang Liu,
- Jianhua Yang,
- Guohua Cao, and
- Chunli Hua
Modern Physics Letters B18 Jun 2019
Preview Abstract
Deep neural network (DNN) has recently been successfully adopted as a regression model in speech enhancement. Nonetheless, training machines to adapt different noise is a challenging task. Because every noise has its own characteristics which can be combined with speech utterance to give huge variation on which the model has to operate on. Thus, a joint framework combining noise classification (NC) and speech enhancement using DNN was proposed. We first determined the noise type of contaminated speech by the voice activity detection (VAD)-DNN and the NC-DNN. Then based on the noise classification results, the corresponding SE-DNN model was applied to enhance the contaminated speech. In addition, in order to make method simpler, the structure of different DNNs was similar and the features were the same. Experimental results show that the proposed method effectively improved the performance of speech enhancement in complex noise environments. Besides, the accuracy of classification had a great influence on speech enhancement.
articleNo Access
Phase-Sensitive Decision-Directed SNR Estimator for Single-Channel Speech Enhancement
- Shifeng Ou,
- Peng Song, and
- Ying Gao
International Journal of Pattern Recognition and Artificial Intelligence09 May 2017
Preview Abstract
The a priori signal-to-noise ratio (SNR) plays an essential role in many speech enhancement systems. Most of the existing approaches to estimate the a priori SNR only exploit the amplitude spectra while making the phase neglected. Considering the fact that incorporating phase information into a speech processing system can significantly improve the speech quality, this paper proposes a phase-sensitive decision-directed (DD) approach for the a priori SNR estimate. By representing the short-time discrete Fourier transform (STFT) signal spectra geometrically in a complex plane, the proposed approach estimates the a priori SNR using both the magnitude and phase information while making no assumptions about the phase difference between clean speech and noise spectra. Objective evaluations in terms of the spectrograms, segmental SNR, log-spectral distance (LSD) and short-time objective intelligibility (STOI) measures are presented to demonstrate the superiority of the proposed approach compared to several competitive methods at different noise conditions and input SNR levels.
articleNo Access
Speech Enhancement Using Modified MMSE-LSA and Phase Reconstruction in Voiced and Unvoiced Speech
- Hairong Jia,
- Weimei Wang,
- Dong Wang, and
- Xueying Zhang
International Journal of Pattern Recognition and Artificial Intelligence24 Oct 2018
Preview Abstract
Aiming at the problem of auditory negative enhancement of typical phase reconstruction method, an improved method of phase reconstruction and MMSE-LSA estimation is proposed. First, the geometric relationship between noisy speech and clean speech in unvoiced segment is used to estimate the phase of the clean speech; Second, considering the randomness of speech appearance in the actual noise environment, a modified MMSE-LSA amplitude estimation is proposed by using the binary hypothesis model. Finally, the new phase reconstruction in voiced and unvoiced speech is combined with the modified MMSE-LSA. The simulation results show that the performance of the algorithm proposed in this paper is better than typical phase reconstruction method in terms of the SegSNR and PESQ.
articleNo Access
SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION IN MOTORCYCLE ENVIRONMENT
- IOSIF MPORAS,
- TODOR GANCHEV,
- OTILIA KOCSIS, and
- NIKOS FAKOTAKIS
International Journal on Artificial Intelligence Tools01 Apr 2010
Preview Abstract
In the present work, we investigate the performance of a number of traditional and recent speech enhancement algorithms in the adverse non-stationary conditions, which are distinctive for motorcycles on the move. The performance of these algorithms is ranked in terms of the improvement they contribute to the speech recognition accuracy, when compared to the baseline performance, i.e. without speech enhancement. The experiments on the MoveOn motorcycle speech and noise database indicated that there is no equivalence between the ranking of algorithms based on the human perception of speech quality and the speech recognition performance. The Multi-band spectral subtraction method was observed to lead to the highest speech recognition performance.
articleNo Access
Real-time Performance Evaluation of Modified Cascaded Median-based Noise Estimation for Speech Enhancement System
- Bittu Kumar
Fluctuation and Noise Letters21 Nov 2019
Preview Abstract
In this paper, the performance evaluation of Modified Cascaded Median (MCM)-based noise estimation method for speech enhancement system has been carried out. The MCM-based method, though reported earlier, was not extensively evaluated; particularly, its real-time performance had not been considered. In the present study, the performance of the MCM-based noise estimation method has been compared with those based on Dynamic Quantile Tracking (DQT) and Cascaded Median (CM), through simulation as well as real-time implementation using TMS320C6416T DSK. All comparisons were made for speech quality (subjectively — mean opinion score and objectively — PESQ score, log-likelihood ratio, weighted spectral slope distance, segmented signal-to-noise ratio and composite measures for signal distortion $C_{SIG}$ , background intrusiveness $C_{BAK}$ and overall distortion $C_{OVL}$ ) at the 95% level of confidence. The real-time parameters such as memory consumption and execution time have been used for real-time implementation and compared for the three methods. The results, for different SNR-based degraded speech signals, show that the modified cascaded median-based noise estimation is the best in terms of PESQ score, $C_{SIG}$ , $C_{BAK}$ , $C_{OVL}$ and mean opinion score. On the other hand, for different noise corrupted-based speech signals, it performs well as compared to the original CM. Memory consumption and average execution time for the MCM-based noise estimation lie in-between those for DQT and CM-based methods.
articleNo Access
Directed Searching Optimization-Based Speech Enhancement Technique
- Sandeep Kumar
Fluctuation and Noise Letters16 Jul 2020
Preview Abstract
In general, the background noise degrades the speech quality. Thus, the intelligibility of the speech can be enhanced by mitigating the effects of background noise and echo suppression. So, speech enhancement can also be viewed as one of the optimization problems. In this work, directed search optimization (DSO) method is used to enhance the speech quality which is originally degraded. The performance of DSO-based speech enhancement method is compared with particle swarm optimization (PSO) and least mean square (LMS)-based methods in terms of output average segmental SNR and speech quality. From the experimental results, it was observed that the output spectrogram, output ASSNR and speech quality using DSO algorithm are far better as compared to PSO and LMS-based methods. Moreover, DSO-based method is computationally less complex as compared to the PSO-based method.
articleNo Access
Comparative Performance Evaluation of Greedy Algorithms for Speech Enhancement System
- Bittu Kumar
Fluctuation and Noise Letters22 Oct 2020
Preview Abstract
In this paper, the performance of compressive sensing (CS)-based technique for speech enhancement has been studied and results analyzed with recovery algorithms as a comparison of their performances. This is done for several recovery algorithms such as matching pursuit, orthogonal matching pursuit, stage-wise orthogonal matching pursuit, compressive sampling matching pursuit and generalized orthogonal matching pursuit. Performances of all these greedy algorithms were compared for speech enhancement. The evaluation of results has been carried out using objective measures (perceptual evaluation of speech quality, log-likelihood ratio, weighted spectral slope distance and segmental signal-to-noise ratio), simulation time and composite objective measures (signal distortion C $_{S I G}$ , background intrusiveness C $_{B A K}$ and overall quality C $_{O V L})$ . Results showed that the CS-based technique using generalized orthogonal matching pursuit algorithm yields better performance than the other recovery algorithms in terms of speech quality and distortion.
articleNo Access
Enhancing Speech Quality Using Artificial Bandwidth Expansion with Deep Shallow Convolution Neural Network Framework
- N. Radha,
- R. B. Jananie, and
- A. Anto Silviya
Fluctuation and Noise Letters30 Jul 2021
Preview Abstract
Speech processing is an important application area of digital signal processing that helps examine and analyze the speech signal. In this processing, speech enhancement is an essential factor because it improves the quality of the signal that helps resolve the communication challenges. Different speech enhancement algorithms are utilized in the research field, but limited processing capabilities, maximum microphone distance, and voice-first I.O. interfaces create the computation complexity. In this paper, speech enhancement is done in two steps. In an initial step, spectral subtraction method is applied to LJ Speech dataset. In the first stage, noise spectrum is estimated during pauses and it is subtracted from the noisy speech signal to obtain the clean speech signal. However, spectral subtraction method still introduces artificial noise and narrow-band noise in the spectrum. Hence, artificial bandwidth expansion with a deep shallow convolution neural network (ABE-DSCNN) is implemented as a second stage in the paper. Further, developed system is compared with conventional enhancement approaches such as deep learning network (DNN), neural beam forming (NB) and generative adversarial network (GAN). The experimental results show that an ABS-DSCNN provides 4% increase of PSEQ and error rate improved by 40% to 56% with respect to the other existing algorithms for 1000 speech samples. Hence, the paper concludes that ABE-DSCNN approach effectively improves the speech quality.
articleNo Access
Wavelet-Based Weighted Low-Rank Sparse Decomposition Model for Speech Enhancement Using Gammatone Filter Bank Under Low SNR Conditions
- K. Venkata Sridhar and
- T. Kishore Kumar
Fluctuation and Noise Letters10 Mar 2023
Preview Abstract
Estimating noise-related parameters in unsupervised speech enhancement (SE) techniques is challenging in low SNR and non-stationary noise environments. In the recent SE approaches, the best results are achieved by partitioning noisy speech spectrograms into low-rank noise and sparse speech parts. However, a few limitations reduce the performance of these SE methods due to the use of overlap and add in STFT process, noisy phase, due to inaccurate estimation of low rank in nuclear norm minimization and Euclidian distance measure in the cost function. These aspects can cause a loss of information in the reconstructed signal when compared to clean speech. To solve this, we propose a novel wavelet-based weighted low-rank sparse decomposition model for enhancing speech by incorporating a gammatone filter bank and Kullback–Leibler divergence. The proposed framework differs from other strategies in which the SE is carried entirely in time domain without the need for noise estimation. Further, to reduce the word error rate, these algorithms were trained and tested on a typical automatic speech recognition module. The experimental findings indicate that the proposed cascaded model has shown significant improvement under low SNR conditions over individual and traditional methods with regard to SDR, PESQ, STOI, SIG, BAK and OVL.
articleFree Access
Speech enhancement via adaptive Wiener filtering and optimized deep learning framework
- Amarendra Jadda and
- Inty Santi Prabha
International Journal of Wavelets, Multiresolution and Information Processing21 Sep 2022
Preview Abstract
In today’s scientific epoch, speech is an important means of communication. Speech enhancement is necessary for increasing the quality of speech. However, the presence of noise signals can corrupt speech signals. Thereby, this work intends to propose a new speech enhancement framework that includes (a) training phase and (b) testing phase. The input signal is first given to STFT-based noise estimate and NMF-based spectra estimate during the training phase in order to compute the noise spectra and signal spectra, respectively. The obtained signal spectra and noise spectra are then Wiener-filtered, then empirical mean decomposition (EMD) is used. Because the tuning factor of Wiener filters is so important, it should be computed for each signal by coaching in a fuzzy wavelet neural network (FW-NN). Subsequently, a bark frequency is computed from the denoised signal, which is then subjected to FW-NN to identify the suitable tuning factor for all input signals in the Wiener filter. For optimal tuning of $η$ , this work deploys the fitness-oriented elephant herding optimization (FO-EHO) algorithm. Additionally, an adaptive Wiener filter is used to supply EMD with the ideal tuning factor from FW-NN, producing an improved speech signal. At last, the presented approach’s supremacy is proved with varied metrics.
articleFree Access
A wavelet network-based speech enhancement system using noisy-as-clean strategy
- Fatemeh Hajiaghababa and
- Hamid Reza Abutalebi
International Journal of Wavelets, Multiresolution and Information Processing12 Jul 2023
Preview Abstract
In recent years, the field of speech enhancement has greatly benefited from the rapid development of neural networks. However, the requirement for large amounts of noisy and clean speech pairs for training limits the widespread use of these models. Wavelet network-based speech enhancement typically relies on clean speech signals as a training target. This paper presents a new method that combines a neural network with the wavelet theory for speech enhancement without the need for clean speech signals as targets in training mode. Five wide evaluation criteria, namely short-time objective intelligibility (STOI), signal-to-noise ratio (SNR), segmental signal-to-noise ratio (SNRseg), weighted spectral slope (WSS) and logarithmic spectral distance (LSD), have been used to confirm the effectiveness of the proposed method. The results show that the proposed method performs similar to a wavelet neural network (WNN) trained with clean signals, or even superior to those obtained from the clean target-based strategies.
articleNo Access
Complex-valued temporal convolutional network for speech enhancement
- Jiaqi Song,
- Lian Zou,
- Liqing Zhou,
- Ziao Liu,
- Cien Fan, and
- Bin Wang
International Journal of Wavelets, Multiresolution and Information Processing29 May 2024
Preview Abstract
In this study, we introduce a novel approach to speech enhancement through the design of a complex temporal convolutional network (Complex-TCN). This model leverages the power of complex networks, enabling the simultaneous capture of both magnitude and phase information inherent in speech signals. By employing a temporal convolutional network, the Complex-TCN excels at extracting contextual information within the time domain of speech. Our findings underscore the substantial performance improvements achieved through the synergistic use of the temporal convolutional network and the incorporation of complex representations.
articleNo Access
Deep Convolutional Neural Network-based Speech Signal Enhancement Using Extensive Speech Features
- Anil Garg and
- O. P. Sahu
International Journal of Computational Methods24 Dec 2021
Preview Abstract
Speech signal often gets corrupted by different noises like airport noise, station noise, and street noise. These noises tend to degrade the quality of the speech signal, particularly in voice communication, automatic speech recognition, and speaker identification. Therefore, it is necessary for automatic speech enhancement. In this research work, a novel speech signal enhancement model is introduced with the assistance of deep learning. The proposed model includes three major phases: (a) pre-processing, (b) feature extraction, and (c) speech enhancement. In the pre-processing phase, the framing will be carried out using the Hanning window, where the input speech signals will be decomposed into a series of overlapping frames. Then, from these individual frames, the multi-features like the improved Mel-frequency cepstral coefficients (IMFCCs), fractional delta AMS, and modified STFT (M-STFT) will be extracted. Subsequently, in the speech enhancement phase, the available noise is estimated initially, and it is removed. The noise removed signals from the frames are used to determine the optimal mask of all the frames of the noisy speech signal, and the mask is employed for training the Deep Convolutional Neural Network (DCNN). The reconstructed outcomes from DCNN are the enhanced speech signal. Finally, the proposed work (multi-features+ DCNN-based Speech Enhancement) is validated over existing models in terms of certain measures, which exhibits the supremacy of the proposed work.
articleNo Access
MODELING AUDITORY PATHWAY FOR INTELLIGENT INFORMATION ACQUISITION
- HYUNG-MIN PARK,
- JONG-HWAN LEE,
- TAESU KIM,
- UN-MIN BAE,
- BYUNG TAEK KIM,
- KI-YOUNG PARK,
- CHANG-MIN KIM, and
- SOO-YOUNG LEE
International Journal of Information Acquisition01 Dec 2004
Preview Abstract
An auditory model has been developed for an intelligent speech information acquisition system in real-world noisy environment. The developed mathematical model of the human auditory pathway consists of three components, i.e. the nonlinear feature extraction from cochlea to auditory cortex, the binaural processing at superior olivery complex, and the top-down attention from higher brain to the cochlea. The feature extraction is based on information-theoretic sparse coding throughout the auditory pathway. Also, the time-frequency masking is incorporated as a model of the lateral inhibition in both time and frequency domain. The binaural processing is modeled as the blind signal separation and adaptive noise canceling based on the independent component analysis with hundreds of time-delays for noisy reverberated signals. The Top-Down (TD) attention comes from familiarity and/or importance of the sensory information, i.e. the sound, and a simple but efficient TD attention model had been developed based on the error backpropagation algorithm. Also, the binaural processing and top-down attention are combined for speech signals with heavy noises. This auditory model requires extensive computing, and special hardware had been developed for real-time applications. Experimental results demonstrate much better recognition performance in real-world noisy environments.
articleNo Access
A QUINTET SINGULAR VALUE DECOMPOSITION THROUGH EMPIRICAL MODE DECOMPOSITIONS
- MIN-SUNG KOH
Advances in Adaptive Data Analysis01 Apr 2014
Preview Abstract
A particular quintet singular valued decomposition (Quintet-SVD) is introduced in this paper via empirical mode decompositions (EMDs). The Quintet-SVD results in four specific orthogonal matrices with a diagonal matrix of singular values. Furthermore, this paper shows relationships between the Quintet-SVD and traditional SVD, generalized low rank approximations of matrices (GLRAM) of one single matrix, and EMDs. One application of the Quintet-SVD for speech enhancement is shown and compared with an application of traditional SVD.
chapterNo Access
A MODIFIED CROSSTALK RESISTANT ADAPTIVE NOISE CANCELLER ALGORITHM FOR SPEECH ENHANCEMENT
- JIE LIN,
- JIAN PING LI,
- SI YU ZHAN,
- JIAN MING LIAO, and
- YAN FU
Wavelet Active Media Technology and Information Processing01 Aug 2006
Preview Abstract
In this paper, we present a modified crosstalk resistant adaptive noise canceller approach. This method proceeds in two steps: First, the signal-to-crosstalk ratio (SCR) is estimated. And when the estimated SCR is less than 6dB,we suggest reconstructing a new reference signal from two original microphone signals. The second step, the basic crosstalk resistant adaptive noise canceller (CTRANC) is used to suppress noises and enhance speech. A comparative study with basic CTRANC shows the superiority of the new method in performance.
chapterNo Access
A New Speech Enhancement Algorithm Using Wavelet Packet Transform
- Jichang Guo and
- Wenliang Wang
Information Computing and Automation01 Apr 2008
Preview Abstract
A new algorithm for speech enhancement based on wavelet shrinkage method is presented in this paper. First, the noisy speech by the Bark-scaled Wavelet Packet (BS-WPD) is decomposed to simulate the human auditory characteristics. Then a new thresholding algorithm which has many advantages over soft and hard thresholdings put forward by D.L. Donoho and I.M. Johnstone is proposed. Simulation results indicate that this new method is very useful and efficient in the process of white noise reduction from speech, and the new thresholding algorithm gives better SNR improvement than other traditional thresholding algorithms.

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Narrow Results

Filters Applied

Publication Type

Article Type

Publication Date

Author

Publication/Book Series

Subjects

Access

Speech Enhancement: A Review of Different Deep Learning Methods

Deep neural network and noise classification-based speech enhancement

Regularized sparse decomposition model for speech enhancement via convex distortion measure

Speech enhancement based on noise classification and deep neural network

Phase-Sensitive Decision-Directed SNR Estimator for Single-Channel Speech Enhancement

Speech Enhancement Using Modified MMSE-LSA and Phase Reconstruction in Voiced and Unvoiced Speech

SPEECH ENHANCEMENT FOR ROBUST SPEECH RECOGNITION IN MOTORCYCLE ENVIRONMENT

Real-time Performance Evaluation of Modified Cascaded Median-based Noise Estimation for Speech Enhancement System

Directed Searching Optimization-Based Speech Enhancement Technique

Comparative Performance Evaluation of Greedy Algorithms for Speech Enhancement System

Enhancing Speech Quality Using Artificial Bandwidth Expansion with Deep Shallow Convolution Neural Network Framework

Wavelet-Based Weighted Low-Rank Sparse Decomposition Model for Speech Enhancement Using Gammatone Filter Bank Under Low SNR Conditions

Speech enhancement via adaptive Wiener filtering and optimized deep learning framework

A wavelet network-based speech enhancement system using noisy-as-clean strategy

Complex-valued temporal convolutional network for speech enhancement

Deep Convolutional Neural Network-based Speech Signal Enhancement Using Extensive Speech Features

MODELING AUDITORY PATHWAY FOR INTELLIGENT INFORMATION ACQUISITION

A QUINTET SINGULAR VALUE DECOMPOSITION THROUGH EMPIRICAL MODE DECOMPOSITIONS

A MODIFIED CROSSTALK RESISTANT ADAPTIVE NOISE CANCELLER ALGORITHM FOR SPEECH ENHANCEMENT

A New Speech Enhancement Algorithm Using Wavelet Packet Transform