A new head tracking algorithm for automatically detecting and tracking human heads in complex backgrounds is proposed. By using an elliptical model for the human head, our Maximum Likelihood (ML) head detector can reliably locate human heads in images having complex backgrounds and is relatively insensitive to illumination and rotation of the human heads. Our head detector consists of two channels: the horizontal and the vertical channels. Each channel is implemented by multiscale template matching. Using a hierarchical structure in implementing our head detector, the execution time for detecting the human heads in a 512×512 image is about 0.02 second in a Sparc 20 workstation (not including the time for image acquisition). Based on the ellipse-based ML head detector, we have developed a head tracking method that can monitor the entrance of a person, detect and track the person's head, and then control the stereo cameras to focus their gaze on this person's head. In this method, the ML head detector and the mutually-supported constraint are used to extract the corresponding ellipses in a stereo image pair. To implement a practical and reliable face detection and tracking system, further verification using facial features, such as eyes, mouth and nostrils, may be essential. The 3D position computed from the centers of the two corresponding ellipses is then used for fixation. An active stereo head has been used to perform the experiments and has demonstrated that the proposed approach is feasible and promising for practical uses.
This paper proposes a complete procedure for the extraction and recognition of human faces in complex scenes. The morphology-based face detection algorithm can locate multiple faces oriented in any direction. The recognition algorithm is based on the minimum classification error (MCE) criterion. In our work, the minimum classification error formulation is incorporated into a multilayer perceptron neural network. Experimental results show that our system is robust to noisy images and complex background.
This paper presents a framework to track multiple persons in real-time. First, a method with real-time and adaptable capability is proposed to extract face-like regions based on skin, motion and silhouette features. Then, an adaptable skin model is used for each detected face to overcome the changes of the observed environment. After that, a two-stage face verification algorithm is proposed to quickly eliminate false faces based on face geometries and the SVM (Support Vector Machine) approach. In order to overcome the effect of lighting changes, during verification, a method of color constancy compensation is proposed. Then, a robust tracking scheme is applied to identify multiple persons based on a face-status table. With the table, the proposed system has powerful capabilities to track different persons at different statuses, which is quite important in face-related applications. Experimental results show that the proposed method is more robust and powerful than other traditional methods, which utilize only color, motion information, and the correlation technique.
In this paper, we present a survey on pattern recognition applications of Support Vector Machines (SVMs). Since SVMs show good generalization performance on many real-life data and the approach is properly motivated theoretically, it has been applied to wide range of applications. This paper describes a brief introduction of SVMs and summarizes its various pattern recognition applications.
In this paper, we propose a new face detection and tracking algorithm for real-life telecommunication applications, such as video conferencing, cellular phone and PDA. We combine template-based face detection and tracking method with color information to track a face regardless of various lighting conditions and complex backgrounds as well as the race. Based on our experiments, we generate robust face templates from wavelet-transformed lowpass and two highpass subimages at the second level low-resolution. However, since template matching is generally sensitive to the change of illumination conditions, we propose a new type of preprocessing method. Tracking method is applied to reduce the computation time and predict precise face candidate region even though the movement is not uniform. Facial components are also detected using k-means clustering and their geometrical properties. Finally, from the relative distance of two eyes, we verify the real face and estimate the size of facial ellipse. To validate face detection and tracking performance of our algorithm, we test our method using six different video categories of QCIF size which are recorded in dynamic environments.
Since pose-varying face images form nonlinear convex manifold in high dimensional image space, it is difficult to model their pose distribution in terms of a simple probabilistic density function. To solve this difficulty, we divide the pose space into many constituent pose classes and treat the continuous pose estimation problem as a discrete pose-class identification problem. We propose to use a hierarchically structured ML (Maximum Likelihood) pose classifiers in the reduced feature space to decrease the computation time for pose identification, where pose space is divided into several pose groups and each group consists of a number of similar neighboring poses. We use the CONDENSATION algorithm to find a newly appearing face and track the face with a variety of poses in real-time. Simulation results show that our proposed pose identification using the hierarchically structured ML pose classifiers can perform a faster pose identification than conventional pose identification using the flat structured ML pose classifiers. A real-time facial pose tracking system is built with high speed hierarchically structured ML pose classifiers.
In this paper, human faces are detected using the skin color information and the Lines-of-Separability (LS) face model. The various skin color spaces based on widely used color models such as RGB, HSV, YCbCr, YUV and YIQ are compared and an appropriate color model is selected for the purpose of skin color segmentation. The proposed approach of skin color segmentation is based on YCbCr color model and sigma control limits for variations in its color components. The segmentation by the proposed method is found to be more efficient in terms of speed and accuracy. Each of the skin segmented regions is then searched for the facial features using the LS face model to detect the face present in it. The LS face model is a geometric approach in which the spatial relationships among the facial features are determined for the purpose of face detection. Hence, the proposed approach based on the combination of skin color segmentation and LS face model is able to detect single as well as multiple faces present in a given image. The experimental results and comparative analysis demonstrate the effectiveness of this approach.
With the advance of semiconductor technology, the current mobile devices support multimodal input and multimedia output. In turn, human computer communication applications can be developed in mobile devices such as mobile phone and PDA. This paper addresses the research issues of face and eye detection on mobile devices. The major obstacles that we need to overcome are the relatively low processor speed, low storage memory and low image (CMOS senor) quality. To solve these problems, this paper proposes a novel and efficient method for face and eye detection. The proposed method is based on color information because the computation time is small. However, the color information is sensitive to the illumination changes. In view of this limitation, this paper proposes an adaptive Illumination Insensitive (AI2) Algorithm, which dynamically calculates the skin color region based on an image color distribution. Moreover, to solve the strong sunlight effect, which turns the skin color pixel into saturation, a dual-color-space model is also developed. Based on AI2algorithm and face boundary information, face region is located. The eye detection method is based on an average integral of density, projection techniques and Gabor filters. To quantitatively evaluate the performance of the face and eye detection, a new metric is proposed. 2158 head & shoulder images captured under uncontrolled indoor and outdoor lighting conditions are used for evaluation. The accuracy in face detection and eye detection are 98% and 97% respectively. Moreover, the average computation time of one image using Matlab code in Pentium III 700MHz computer is less than 15 seconds. The computational time will be reduced to tens hundreds of millisecond (ms) if low level programming language is used for implementation. The results are encouraging and show that the proposed method is suitable for mobile devices.
Novel features and weak classifiers are proposed for face detection within the AdaBoost learning framework. Features are histograms computed from a set of spatial templates in filtered images. The filter banks consist of Intensity, Laplacian of Gaussian (Difference of Gaussians), and Gabor filters, aiming to capture spatial and frequency properties of faces at different scales and orientations. Features selected by AdaBoost learning, each of which corresponds to a histogram with a pair of filter and template, can thus be interpreted as boosted marginal distributions of faces. We fit the Gaussian distribution of each histogram feature only for positives (faces) in the sample set as the weak classifier. The results of the experiment demonstrate that classifiers with corresponding features are more powerful in describing the face pattern than haar-like rectangle features introduced by Viola and Jones.
A dynamic counterpropagation network based on the forward only counterpropagation network (CPN) is applied as the classifier for face detection. The network, called the dynamic supervised forward-propagation network (DSFPN) trains using a supervised algorithm that grows dynamically during training allowing subclasses in the training data to be learnt. The network is trained using a reduced dimensionality categorized wavelet coefficients of the image data. Experimental results obtained show that a 94% correct detection rate can be achieved with less than 6% false positives.
The eye line is defined to be a horizontal line passing the two eyes of a human face. It can be used to help locate the true positions of eyes and face. In this paper, we propose a method to extract the horizontal eye line of a face under various environments. Based on the facts that the eye color is very different from skin color and the gray level variance of an eye is high, some eye-like regions are first located. Then the horizontal eye line is extracted based on the located eye-like regions and some geometric properties in a face. Experimental results are given to show that the proposed method is robust under a wide range of lighting conditions, different poses and races. The detection rate for HHI face database is 95.63%. For Champion face database, the detection rate is 94.27%.
This paper proposes a pose robust human detection and identification method for sequences of stereo images using multiply-oriented 2D elliptical filters (MO2DEFs), which can detect and identify humans regardless of scale and pose. Four 2D elliptical filters with specific orientations are applied to a 2D spatial-depth histogram, and threshold values are used to detect humans. The human pose is then determined by finding the filter whose convolution result was maximal. Candidates are verified by either detecting the face or matching head-shoulder shapes. Human identification employs the human detection method for a sequence of input stereo images and identifies them as a registered human or a new human using the Bhattacharyya distance of the color histogram. Experimental results show that (1) the accuracy of pose angle estimation is about 88%, (2) human detection using the proposed method outperforms that of using the existing Object Oriented Scale Adaptive Filter (OOSAF) by 15–20%, especially in the case of posed humans, and (3) the human identification method has a nearly perfect accuracy.
In this paper, an automatic rotation invariant multiview face detection method, which utilizes modified Skin Color Model (SCM), is presented. First, Gaussian Mixture Model (GMM) and Support Vector Machine (SVM) based hybrid models are used to classify human skin regions from color images. The novelty of the adaptive hybrid model is its ability to predict the chromatic skin color band for individual images based on calibration differences of camera and luminance condition of environment. Classified skin regions are then converted to gray scale image with a threshold based on the predicted chromatic skin color bands, which further enhances detection performance. Next, Principle Component Analysis (PCA) is applied to gray segmented regions. Face detection is carried out based on the PCA-based extracted features, along with selected features, using support vector regression. The output of this procedure is used to report the final result of face detection. The proposed method is also beneficial for the rotation invariant face recognition problem.
Driven by key law enforcement and commercial applications, research on face recognition from video sources has intensified in recent years. The ensuing results have demonstrated that videos possess unique properties that allow both humans and automated systems to perform recognition accurately in difficult viewing conditions. However, significant research challenges remain as most video-based applications do not allow for controlled recordings. In this survey, we categorize the research in this area and present a broad and deep review of recently proposed methods for overcoming the difficulties encountered in unconstrained settings. We also draw connections between the ways in which humans and current algorithms recognize faces. An overview of the most popular and difficult publicly available face video databases is provided to complement these discussions. Finally, we cover key research challenges and opportunities that lie ahead for the field as a whole.
Accurate face recognition is today vital, principally for reasons of security. Current methods employ algorithms that index (classify) important features of human faces. There are many current studies in this field but most current solutions have significant limitations. Principal Component Analysis (PCA) is one of the best facial recognition algorithms. However, there are some noises that could affect the accuracy of this algorithm. The PCA works well with the support of preprocessing steps such as illumination reduction, background removal and color conversion. Some current solutions have shown results when using a combination of PCA and preprocessing steps. This paper proposes a hybrid solution in face recognition using PCA as the main algorithm with the support of a triangular algorithm in face normalization in order to enhance indexing accuracy. To evaluate the accuracy of the proposed hybrid indexing algorithm, the PCAaTA is tested and the results are compared with current solutions.
The video surveillance system based on face analysis has played an increasingly important role in the security industry. Compared with identification methods of other physical characteristics, face verification method is easy to be accepted by people. In the video surveillance scene, it is common to capture multiple faces belonging to a same person. We cannot get a good result of face recognition if we use all the images without considering image quality. In order to solve this problem, we propose a face deduplication system which is combined with face detection and face quality evaluation to obtain the highest quality face image of a person. The experimental results in this paper also show that our method can effectively detect the faces and select the high-quality face images, so as to improve the accuracy of face recognition.
The human emotion recognition based on facial expression has a significant meaning in the application of intelligent man–machine interaction. However, the human face images vary largely in real environments due to the complex backgrounds and luminance. To solve this problem, this paper proposes a robust face detection method based on skin color enhancement model and a facial expression recognition algorithm with block principal component analysis (PCA). First, the luminance range of human face image is broadened and the contrast ratio of skin color is strengthened by the homomorphic filter. Second, the skin color enhancement model is established using YCbCr color space components to locate the face area. Third, the feature based on differential horizontal integral projection is extracted from the face. Finally, the block PCA with deep neural network is used to accomplish the facial expression recognition. The experimental results indicate that in the case of weaker illumination and more complicated backgrounds, both the face detection and facial expression recognition can be achieved effectively by the proposed algorithm, meanwhile the mean recognition rate obtained by the facial expression recognition method is improved by 2.7% comparing with the traditional Local Binary Patterns (LBPs) method.
Since the face detection technology is widely used, improving the detection speed and accuracy in face detection tasks has become a key challenge. Therefore, this paper takes the MTCNN model as the research object and makes improvements, the purpose of which is to optimize the detection speed and accuracy simultaneously. A dynamic min size algorithm is proposed. According to the size of the input image, it dynamically controls the minimum size of the face to be recognized by the model and reduces the number of iterations of the image pyramid, increasing the number of detection frames per second by 4fps. Then, the standard convolution structure in the P-Net and R-Net models is replaced by the depth-wise separable convolution, which effectively reduces the number of parameters and computation of the model. Meanwhile, an O-Net model with a densely connected structure is also developed. Our experiments with well-known public datasets have demonstrated that the proposed network structure can improve the detection frame rate. By reusing the features of different levels of the image, the recall rate and the precision rate of the MTCNN model on the validation set are increased by 2.39% and 1.65%, respectively.
Pose estimation is the basis and key of human motion recognition. In the two-dimensional human pose estimation based on image, in order to reduce the adverse effects of mutual occlusion among multiple people and improve the accuracy of motion recognition, a structurally symmetrical two-dimensional multi-person pose estimation model combined with face detection is proposed in this paper. First, transfer learning is used to initialize each sub-branch network model. Then, MTCNN is used for face detection to predict the number of people in the image. According to the number of people, the image is input into the improved two-branch OpenPose network. What is more, the double judgment algorithm is proposed to correct the false detection of MTCNN. The experimental results show that compared with TensorPose, which is the latest improved method based on OpenPose, the Average Precision (AP) (Intersection over Union (IoU)=0.5) on the validation set is 8.8 higher. Furthermore, compared with OpenPose, the mean AP (IoU=0.5:0.95) is 1.7 higher on the validation set and is 1.3 higher on the Test-dev test set.
Modern face detection algorithms fail to provide optimal results when they have to deal with larger amounts of data per frame while processing higher quality videos. This paper tackles that problem and offers a solution to deploy commercially used state-of-the-art face detection algorithms to process only the regions of interest in a frame, and discard the rest to decrease the data to be processed. The model maintains the accuracy of the base algorithm while decreasing the processing time per frame, thereby increasing the overall efficiency. The selection of region of interest is dependent on the detection of facial window in the previous frame. Therefore, the choice of base algorithm plays an important role in determining the speed of the framework. The model achieves increased processing speeds of about 69–76% more than the standalone usage of the detection algorithms for analyzed frame rates.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.