Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Speaker recognition is the technique to identify the identity of a person from statistical features obtained from speech signals. Many speaker recognition techniques have been designed and implemented so far to efficiently recognize the speaker. From the existing review, it is found that the existing speaker recognition techniques suffer from the over-fitting issues. Therefore, to overcome the over-fitting issue in this paper, we design, a novel ensemble-based quantum neural network. It selects one base model (i.e. expert) for each query, and concentrates on inductive bias reduction. A set of quantum neural networks are trained by considering different kinds of quantum features and are afterwards used to recognize the speaker. In the end, ensembling is used to combine these classification results. Extensive experiments have been carried out by considering the proposed technique and existing competitive machine learning-based speaker recognition techniques on speaker recognition data. It is observed that the proposed technique outperforms existing speaker recognition techniques in terms of accuracy and sensitivity by 1.371% and 1.291%, respectively.
Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: “IIT-Madras speech synthesis and recognition” (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and “ELSDSR” (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30–32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.
The mismatch between the training and the testing environments greatly degrades the performance of speaker recognition. Although many robust techniques have been proposed, speaker recognition in mismatch condition is still a challenge. To solve this problem, we propose a sparse-based auditory model as the front-end of speaker recognition by simulating auditory processing of speech signal. To this end, we introduce narrow-band filter-bank instead of the widely used wide-band filter-bank to simulate the basilar membrane filter-bank, use sparse representation as the approximation of basilar membrane coding strategy, and incorporate the frequency selectivity enhance mechanism between tectorial membrane and basilar membrane by practical engineering approximation. Compared with the standard Mel-frequency cepstral coefficient approach, our preliminary experimental results indicate that the sparse-based auditory model consistently improve the robustness of speaker recognition in mismatched condition.
This paper presents main paradigms of research for feature extraction methods to further augment the state of art in speaker recognition (SR) which has been recognized extensively in person identification for security and protection applications. Speaker recognition system (SRS) has become a widely researched topic for the last many decades. The basic concept of feature extraction methods is derived from the biological model of human auditory/vocal tract system. This work provides a classification-oriented review of feature extraction methods for SR over the last 55 years that are proven to be successful and have become the new stone to further research. Broadly, the review work is dichotomized into feature extraction methods with and without noise compensation techniques. Feature extraction methods without noise compensation techniques are divided into following categories: On the basis of high/low level of feature extraction; type of transform; speech production/auditory system; type of feature extraction technique; time variability; speech processing techniques. Further, feature extraction methods with noise compensation techniques are classified into noise-screened features, feature normalization methods, feature compensation methods. This classification-oriented review would endow the clear vision of readers to choose among different techniques and will be helpful in future research in this field.
Automatic Speech Recognition (ASR) has been an intensive research area during the recent years in internet to enable natural human–machine communication. However, the existing Deep Neutral Network (DNN) techniques need more focus on feature extraction process and recognition accuracy. Thus, an enhanced deep neural network (DNN)-based approach for speaker recognition with a novel Triumvirate Euphemism Strategy (TES) is proposed. This overcomes poor feature extraction from Mel-Frequency Cepstral Coefficient (MFCC) map by extracting the features based on petite, hefty and artistry of the features. Then, the features are trained with Silhouette Martyrs Method (SMM) without any inter-class and intra-class separability problems and margins are affixed between classes with three new loss functions, namely A-Loss, AM-Loss and AAM-Loss. Additionally, the parallelization is done by a mini-batch-based BP algorithm in DNN. A novel Frenzied Heap Atrophy (FHA) with a multi-GPU model is introduced in addition with DNN to enhance the parallelized computing that accelerates the training procedures. Thus, the outcome of the proposed technique is highly efficient that provides feasible extraction features and gives incredibly precise results with 97.5% accuracy in the recognition of speakers. Moreover, various parameters were discussed to prove the efficiency of the system and also the proposed method outperformed the existing methods in all aspects.