Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Rapid increase in the amount of audio data demands an efficient method to automatically segment or classify audio stream based on its content. In this paper, based on the Gabor wavelet features, an audio classification and segmentation method is proposed. This method will first divide an audio stream into clips, each of which contains one-second audio information. Then, each clip is classified as one of two classes or five classes. Two classes contain speech and music; pure speech, pure music, song, speech with music background, and speech with environmental noise background are for five classes. Finally, a merge technique is provided to do segmentation.
In order to make the proposed method robust for a variety of audio sources, we use Fisher Linear Discriminator to obtain features with the highest discriminative ability. Experimental results show that the proposed method can achieve over 98% accuracy rate for speech and music discrimination, and more than 95% for a five-way discrimination. By checking the class types of adjacent clips, we can also identify more than 95% audio scene breaks in audio sequence.
Rapid increase in the amount of audio data and especially music collections demand an efficient method to automatically retrieve audio objects based on its content. In this paper, based on the Gabor wavelet features, we will propose a method for content-based retrieval of perceptually similar music pieces in audio documents. It allows the user to select a reference passage within an audio file and retrieve perceptually similar passages such as repeating phrases within a music piece, similar music clips in a database or one song sung by different persons or in different languages.
The proposed method will first divide an audio stream into clips, each of which contains one-second audio information. Then, the frame-based features of each clip are extracted based on the Gabor wavelet filters. Finally, a similarity measuring technique is provided to perform pattern matching on the resulting sequences of feature vectors. Experimental results show that the proposed method can achieve over 96% accuracy rate for audio retrieval.
The unique iris pattern of each human eye is complex, but easily be scanned or captured by a camera. However, the high cost infrared iris scanners used for acquisition causes inconvenience to users by distance related constraints. This restricts its widespread use in real-time applications such as airports and banks. The images captured by cameras under visible wavelength are obstructed by the presence of reflections and shadows which requires additional attention. The main objective of this paper is to propose a secure biometric iris authentication system by fusion of RGB channel information from the real-time data captured under visible wavelength and varying light conditions. The proposed system is adapted to a real-time noisy iris dataset. The effectiveness of this proposed system was tested on two different color iris datasets, namely, a public database UBIRISv1 and a newly created database SSNDS which contains images captured with any digital/mobile camera of minimum 5MP under unconstrained environments. This system supports the cross sensor acquisition and successful iris segmentation from these unconstrained inputs. The features from each channel are extracted using log Gabor filter and a matching is performed using hamming distance based on two thresholds (inter and intra class variations). The performance quality of the proposed biometric system leads to the feasibility of a new cost-effective approach for any real-time application, which requires authentication to ensure quality service, enhance security, eliminate fraud, and maximize effectiveness.
Ear biometrics attracted the attention of researchers in computer vision and machine learning for its use in many applications. In this paper, we present a fully automated system for recognition from ear images based upon sparse representation. In sparse representation, extracted features from the training data is used to develop a dictionary. Classification is achieved by representing the extracted features of the test data as a linear combination of entries in the dictionary. In fact, there are many solutions for this problem and the goal is to find the sparsest solution. We use a relatively new algorithm named smoothed l0 norm to find the sparsest solution and Gabor wavelet features are used for building the dictionary. Furthermore, we expand the proposed approach for gender classification from ear images. Several researches have addressed this issue based on facial images. We introduce a novel approach based on majority voting for gender classification. Experimental results conducted on the University of Notre Dame (UND) collection J data set, containing large appearance, pose, and lighting variations, resulted in a gender classification rate of 89.49%. Furthermore, the proposed method is evaluated on the WVU data set and classification rates for different view angles are presented. Results show improvement and great robustness in gender classification over existing methods.
In this paper, we report our experiments on feature-based facial expression recognition within an architecture based on a two-layer perceptron. We investigate the use of two types of features extracted from face images: the geometric positions of a set of fiducial points on a face, and a set of multiscale and multiorientation Gabor wavelet coefficients at these points. They can be used either independently or jointly. The recognition performance with different types of features has been compared, which shows that Gabor wavelet coefficients are much more powerful than geometric positions. Furthermore, since the first layer of the perceptron actually performs a nonlinear reduction of the dimensionality of the feature space, we have also studied the desired number of hidden units, i.e. the appropriate dimension to represent a facial expression in order to achieve a good recognition rate. It turns out that five to seven hidden units are probably enough to represent the space of facial expressions. Then, we have investigated the importance of each individual fiducial point to facial expression recognition. Sensitivity analysis reveals that points on cheeks and on forehead carry little useful information. After discarding them, not only the computational efficiency increases, but also the generalization performance slightly improves. Finally, we have studied the significance of image scales. Experiments show that facial expression recognition is mainly a low frequency process, and a spatial resolution of 64 pixels × 64 pixels is probably enough.
Appropriate representation is one of the keys to the success of face recognition technologies. In this paper, we present a novel face representation approach using a reduced set of local histograms based on Local Gabor Binary Patterns (LGBP). In the proposed method, a face image is first represented by the LGBP histograms which are extracted from the LGBP images. Then, the local LGBP histograms with high separability and low relevance are selected to obtain a dimension-reduced face descriptor. Extensive experimental results demonstrate that the proposed method not only greatly reduces the dimensionality of face representation, but also outperforms the state-of-the-art approaches for face recognition, such as Fisherfaces, and Gabor Fisher Classification (GFC).
Robust face recognition is a challenging problem, due to facial appearance variations in illumination, pose, expression, aging, partial occlusions and other changes. This paper proposes a novel face recognition approach, where face images are represented by Gabor pixel-pattern-based texture feature (GPPBTF) and local binary pattern (LBP), and null pace-based kernel Fisher discriminant analysis (NKFDA) is applied to the two features independently to obtain two recognition results which are eventually combined together for a final identification. To get GPPBTF, we first transform an image into Gabor magnitude maps of different orientations and scales, and then use pixel-pattern-based texture feature to extract texture features from Gabor maps. In order to improve the final performance of the classification, this paper proposes a multiple NKFDA classifiers combination approach. Extensive experiments on FERET face database demonstrate that the proposed method not only greatly reduces the dimensionality of face representation, but also achieves more robust result and higher recognition accuracy.
Samples from stochastic signals having sufficient complexity need reveal only a little unexpected shared structure, in order to reject the hypothesis that they are independent. The mere failure of a test of statistical independence can thereby serve as a basis for recognizing stochastic patterns, provided they possess enough degrees-of-freedom, because all unrelated ones would pass such a test. This paper discusses exploitation of this statistical principle, combined with wavelet image coding methods to extract phase descriptions of incoherent patterns. Demodulation and coarse quantization of the phase information creates decision environments characterized by well-separated clusters, and this lends itself to rapid and reliable pattern recognition.
In this work, an effective method has been proposed for texture segmentation, which incorporates the best features of filter bank and statistical approaches. This technique combines the features of Gabor wavelets (filter based) and General Moments (statistical) approaches. The method has been successfully tested for various textures from Brodatz texture collection. The relative performance of this method against the conventional approaches has been analyzed using Fisher Criterion.
Graphs labeled with complex-valued Gabor jets are one of the important data formats for face recognition and the classification of facial images into medically relevant classes like genetic syndromes. We here present an interpolation rule and an iterative algorithm for the reconstruction of images from these graphs. This is especially important if graphs have been manipulated for information processing. One such manipulation is averaging the graphs of a single syndrome, another one building a composite face from the features of various individuals. In reconstructions of averaged graphs of genetic syndromes, the patients' identities are suppressed, while the properties of the syndromes are emphasized. These reconstructions from average graphs have a much better quality than averaged images.
In this paper, a multi-stage matching strategy is proposed to boost the performance of a non-graph matching feature-based face recognition. As the gallery size increases, the problem of recognition degradation gradually arises, due to the fact that the correct matching of feature points becomes more and more difficult. Other than only one round of matching in traditional methods, the multi-stage matching strategy determines the recognition result step by step. Instead of finding the best one matching, each step picks out a small portion of the training candidates and removes the others. The behavior of picking and removing repeats until the number of remaining candidates is small enough to produce the final result. Two multi-stage matching algorithms, n-ary elimination and divide and conquer, are introduced into the non-graph matching feature-based method from the perspectives of global and local, respectively. The experimental result shows that with the multi-stage matching strategy, the recognition accuracy of the non-graph matching feature-based method is evidently boosted. Moreover, the improvement level also increases with the gallery size.