![]() |
This book is a collection of scientific papers published during the last five years, showing a broad spectrum of actual research topics and techniques used to solve challenging problems in the areas of computer vision and image analysis. The book will appeal to researchers, technicians and graduate students.
Sample Chapter(s)
Chapter 1: An Appearance-Based Method for Parametric Video Registration (2,352 KB)
https://doi.org/10.1142/9789812834461_fmatter
The following sections are included:
https://doi.org/10.1142/9789812834461_0001
In this paper, we address the problem of multi-frame video registration using an appearance-based framework, where linear subspace constraints are applied in terms of the appearance subspace constancy assumption. We frame the multiple-image registration in a two step iterative algorithm. First, a feature space is built through and Singular Value Decomposition (SVD) of a second moment matrix provided by the images in the sequence to be analyzed, where the variabilities of each frame respect to a previously selected frame of reference are encoded. Secondly, a parametric model is introduced in order to estimate the transformation that has been produced across the sequence. This model is described in terms of a polynomial representation of the velocity field evolution, which corresponds to a parametric multi-frame optical flow estimation. The objective function to be minimized considers both issues at the same time, i.e., the appearance representation and the time evolution across the sequence. This function is the connection between the global coordinates in the subspace representation and the parametric optical flow estimates. Both minimization steps are reduced to two linear least squares sub-problems, whose solutions turn out to be in closed form for each iteration. The appearance constraints result to take into account all the images in a sequence in order to estimate the transformation parameters. Finally, results show the extraction of 3D affine structure from multiple views depending on the analysis of the surface polynomial's degree.
https://doi.org/10.1142/9789812834461_0002
This work introduces an interactive algorithm for image smoothing and segmentation. A non-linear partial differential equation is employed to smooth the image while preserving contours. The segmentation is a region-growing and merging process initiated around image minima (seeds), which are automatically detected, labeled and eventually merged. The user places one marker per region of interest. Accurate and fast segmentation results can be achieved for gray and color images using this simple method.
https://doi.org/10.1142/9789812834461_0003
In the latest years, multifractal analysis has been applied to image analysis. The multifractal framework takes advantage of multiscaling properties of images to decompose them as a collection of different fractal components, each one associated to a singularity exponent (an exponent characterizing the way in which that part of the image evolves under changes in scale). One of those components, characterized by the least possible exponent, seems to be the most informative about the whole image. Very recently it has been proposed an algorithm to reconstruct the image from this component, just using physical information conveyed by it. In this paper, we will show that the same algorithm can be used to assess the relevance of the other fractal parts of the image.
https://doi.org/10.1142/9789812834461_0004
Deformable Models are extensively used as a Pattern Recognition technique. They are curves defined within an image domain that can be moved under the influence of internal and external forces. Some trade-offs of standard deformable models algorithms are the selection of image energy function (external force), the location of initial snake and the attraction of contour points to local energy minima when the snake is being deformed. This paper proposes a new procedure using potential fields as external forces. In addition, standard Deformable Models algorithm has been enhanced with both this new external force and algorithmic improvements. The performance of the presented approach has been successfully proved to extract muscles from Magnetic Resonance Imaging (MRI) sequences of Iberian ham at different maturation stages in order to calculate their volume change. The main conclusions of this paper are the practical viability of potential fields used as external forces, as well as the validation of the algorithmic improvements developed. The feasibility of applying Computer Vision techniques, in conjunction with MRI, for determining automatically the optimal ripening time of the Iberian ham is a practical conclusion reached with the proposed approach.
https://doi.org/10.1142/9789812834461_0005
Automatic handwritten text recognition by computer has a number of interesting applications. However, due to a great variety of individual writing styles, the problem is very difficult and far from being solved. Recently, a number of classifier creation methods, known as ensemble methods, have been proposed in the field of machine learning. They have shown improved recognition performance over single classifiers. For the combination of these classifiers many methods have been proposed in the literature. In this paper we describe a weighted voting scheme where the weights are obtained by a genetic algorithm.
https://doi.org/10.1142/9789812834461_0006
In this paper, an image segmentation method based on Dempster-Shafer evidence theory is proposed. Basic probability assignment (bpa) is estimated in unsupervised way using pixels fuzzy membership degrees derived from image histogram. No assumption is made about the images data distribution. bpa is estimated at pixel level. The effectiveness of the method is demonstrated on synthetic and real images.
https://doi.org/10.1142/9789812834461_0007
This paper presents a tracking algorithm for automatic instrument localization in robotically assisted laparoscopic surgery. We present a simple and robust system that does not need the presence of artificial marks, or special colours to distinguish the instruments. So, the system enables the robot to track the usual instruments used in laparoscopic operations. Since the instruments are normally the most structured objects in laparoscopic scenes, the algorithm uses the Hough transform to detect straight lines in the scene. In order to distinguish among different instruments or other structured elements present in the scene, motion information is also used. We give in this paper a detailed description of all stages of the system.
https://doi.org/10.1142/9789812834461_0008
Fractal image compression gives some desirable properties like resolution independence, fast decoding, and very competitive rate-distortion curves. But still suffers from a (sometimes very) high encoding time, depending on the approach being used. This paper presents a method to reduce the encoding time of this technique by reducing the size of the domain pool based on the Entropy value of each domain block. Experimental results on standard images show that the proposed method yields superior performance over conventional fractal encoding.
https://doi.org/10.1142/9789812834461_0009
Digital watermarking is a key technique practical intellectual property protecting systems and concealment correspondence systems. In this paper, we discussed a blind detection method for the digital image watermark. The theories research show that the orthogonal projection sequence of a digital image is one-to-one correspondence with this digital image. By this conclusion, we designed a novel blind watermark detector. In this detector, to calculate the correlation value between the image and watermark, the intensity information of digital image is not used, and the orthogonal projection sequence of this image is used. Experiment results show that this watermark detector not only to have very strong resistant ability to translation and rotation attacks, but also to have the good robustness to Gaussian noise. Performance of this watermark detector is better than general detector designed by the intensity information directly. The conclusions of this paper are useful to the research in the future.
https://doi.org/10.1142/9789812834461_0010
We have recently developed in our lab a text recognizer for on-line texts written on a touch-terminal. We present in this paper several strategies to adapt this recognizer in a self-supervised way to a given writer and compare them to the supervised adaptation scheme. The baseline system is based on the activation-verification cognitive model. We have designed this recognizer to be writer-independent but it may be adapted to be writer-dependent in order to increase the recognition speed and rate. The classification expert can be iteratively modified in order to learn the particularities of a writer. The best self-supervised adaptation strategy is called prototype dynamic management and gets good results, close to those of the supervised methods. The combination of supervised and self-supervised strategies increases accuracy again. Results, presented on a large database of 90 texts (5,400 words) written by 38 different writers are very encouraging with an error rate lower than 10%.
https://doi.org/10.1142/9789812834461_0011
The motivation of this work is based on two key observations. First, the classification algorithms can be separated into two main categories: discriminative and model-based approaches. Second, two types of patterns can generate problems: ambiguous patterns and outliers. While, the first approach tries to minimize the first type of error, but cannot deal effectively with outliers, the second approach, which is based on the development of a model for each class, make the outlier detection possible, but are not sufficiently discriminant. Thus, we propose to combine these two different approaches in a modular two-stage classification system embedded in a probabilistic framework. In the first stage we estimate the posterior probabilities with a model-based approach and we re-estimate only the highest probabilities with appropriate Support Vector Classifiers (SVC) in the second stage. Another advantage of this combination is to reduce the principal burden of SVC, the processing time necessary to make a decision and to open the way to use SVC in classification problem with a large number of classes. Finally, the first experiments on the benchmark database MNIST have shown that our dynamic classification process allows to maintain the accuracy of SVCs, while decreasing complexity by a factor 8.7 and making the outlier rejection available.
https://doi.org/10.1142/9789812834461_0012
We present a learning strategy for Hidden Markov Models that may be used to cluster handwriting sequences or to learn a character model by identifying its main writing styles. Our approach aims at learning both the structure and parameters of a Hidden Markov Model (HMM) from the data. A byproduct of this learning strategy is the ability to cluster signals and identify allograph. We provide experimental results on artificial data that demonstrate the possibility to learn from data HMM parameters and topology. For a given topology, our approach outperforms in some cases that we identify standard Maximum Likelihood learning scheme. We also apply our unsupervised learning scheme on on-line handwritten signals for allograph clustering as well as for learning HMM models for handwritten digit recognition.
https://doi.org/10.1142/9789812834461_0013
A novel way to learn and track simultaneously the appearance of a previously non-seen face without intrusive techniques can be found in this article. The presented approach has a causal behaviour: no future frames are needed to process the current ones. The model used in the tracking process is refined with each input frame thanks to a new algorithm for the simultaneous and incremental computation of the singular value decomposition (SVD) and the mean of the data. Previously developed methods about iterative computation of SVD are taken into account and an original way to extract the mean information from the reduced SVD of a matrix is also considered. Furthermore, the results are produced with linear computational cost and sublinear memory requirements with respect to the size of the data. Finally, experimental results are included, showing the tracking performance and some comparisons between the batch and our incremental computation of the SVD with mean information.
https://doi.org/10.1142/9789812834461_0014
In this paper, we address the analysis of human actions by comparing different performances of the same action executed by different actors. Specifically, we present a comparison procedure applied to the walking action, but the scheme can be applied to other different actions, such as bending, running, etc. To achieve fair comparison results, we define a novel human body model based on joint angles, which maximizes the differences between human postures and, moreover, reflects the anatomical structure of human beings. Subsequently, a human action space, called aSpace, is built in order to represent each performance (i.e., each predefined sequence of postures) as a parametric manifold. The final human action representation is called p–action, which is based on the most characteristic human body postures found during several walking performances. These postures are found automatically by means of a predefined distance function, and they are called key-frames. By using key-frames, we synchronize any performance with respect to the p–action. Furthermore, by considering an arc length parameterization, independence from the speed at which performances are played is attained. Consequently, the style of human walking is successfully analysed by establishing the differences of the joints between a male and a female walkers.
https://doi.org/10.1142/9789812834461_0015
We propose a two-step method for detecting human heads with their orientations. In the first step, the method employs an ellipse as the contour model of human-head appearances to deal with wide variety of appearances. Our method then evaluates the ellipse to detect possible human heads. In the second step, on the other hand, our method focuses on features inside the ellipse, such as eyes, the mouth or cheeks, to model facial components. The method evaluates not only such components themselves but also their geometric configuration to eliminate false positives in the first step and, at the same time, to estimate face orientations. Our intensive experiments show that our method can correctly and stably detect human heads with their orientations.
https://doi.org/10.1142/9789812834461_0016
This paper presents a new approach for human walking modeling from monocular image sequences. A kinematics model and a walking motion model are introduced in order to exploit prior knowledge. The proposed technique consists of two steps. Initially, an efficient feature point selection and tracking approach is used to compute feature points' trajectories. Peaks and valleys of these trajectories are used to detect key frames—frames where both legs are in contact with the floor. Secondly, motion models associated with each joint are locally tuned by using those key frames. Differently than previous approaches, this tuning process is not performed at every frame, reducing CPU time. In addition, the movement's frequency is defined by the elapsed time between two consecutive key frames, which allows handling walking displacement at different speed. Experimental results with different video sequences are presented.
https://doi.org/10.1142/9789812834461_0017
Visual tracking of articulated motion is a complex task with high computational costs. Because of the fact that articulated objects are usually represented as a set of linked limbs, tracking is performed with the support of a model. Model-based tracking allows determining object pose in an effortless way and handling occlusions. However, the use of articulated models generates a multidimensional state-space and, therefore, the tracking becomes computationally very expensive or even infeasible.
Due to the dynamic nature of the problem, some sequential estimation algorithms like particle filters are usually applied to visual tracking. Unfortunately, particle filter fails in high dimensional estimation problems such as articulated objects or multiple object tracking. These problems are called dynamic optimization problems. Metaheuristics, which are high level general strategies for designing heuristics procedures, have emerged for solving many real world combinatorial problems as a way to efficiently and effectively exploring the problem search space. Path relinking (PR) and scatter search (SS) are evolutionary metaheuristics successfully applied to several hard optimization problems. PRPF and SSPF algorithms respectively hybridize both, particle filter and these two population-based metaheuristic schemes.
In this paper, We present and compare two different hybrid algorithms called Path Relinking Particle Filter (PRPF) and Scatter Search Particle Filter (SSPF), applied to 2D human motion tracking. Experimental results show the proposed algorithms increase the performance of standard particle filters.
https://doi.org/10.1142/9789812834461_0018
Biometrics identification methods proved to be very efficient, more natural and easy for users than traditional methods of human identification. In fact, only biometrics methods truly identify humans, not keys and cards they posses or passwords they should remember.
The future of biometrics will surely lead to systems based on image analysis as the data acquisition is very simple and requires only cameras, scanners or sensors. More importantly such methods could be passive, which means that the user does not have to take active part in the whole process or, in fact, would not even know that the process of identification takes place. There are many possible data sources for human identification systems, but the physiological biometrics seem to have many advantages over methods based on human behaviour. The most interesting human anatomical parts for such passive, physiological biometrics systems based on images acquired from cameras are face and ear. Both of those methods contain large volume of unique features that allow to distinctively identify many users and will be surely implemented into efficient biometrics systems for many applications.
The article introduces to ear biometrics and presents its advantages over face biometrics in passive human identification systems. Then the geometrical method of feature extraction from human ear images in order to perform human identification is presented.
https://doi.org/10.1142/9789812834461_0019
This paper presents an improved approach for matching objects represented in dynamic pedobarography image sequences, based on finite element modeling, modal analysis and optimization techniques. In this work, the determination of correspondences between objects data points is improved by using optimization techniques and, because the number of data points of each object is not necessary the same, a new algorithm to match the excess points is also proposed. This new matching algorithm uses a neighbourhood criterion and can overcome some disadvantages of the usual “one to one” matching.
The considered approach allows the determination of correspondences between 2D or 3D objects data points, and is here apply in dynamic pedobarography images.
https://doi.org/10.1142/9789812834461_0020
In video surveillance and sports analysis applications, object trajectories offer the possibility of extracting rich information on the underlying behavior of the moving targets. To this end we introduce an extension of Point Distribution Models (PDM) to analyze the object motion in their spatial, temporal and spatiotemporal dimensions. These trajectory models represent object paths as an average trajectory and a set of deformation modes, in the spatial, temporal and spatiotemporal domains. Thus any given motion can be expressed in terms of its modes, which in turn can be ascribed to a particular behavior.
The proposed analysis tool has been tested on motion data extracted from a vision system that was tracking radio-guided cars running inside a circuit. This affords an easier interpretation of results, because the shortest lap provides a reference behavior. Besides showing an actual analysis we discuss how to normalize trajectories to have a meaningful analysis.
https://doi.org/10.1142/9789812834461_0021
This paper describes an improvement of a classical energy-based model to simulate elastically deformable solids. The classical model lacks the ability to prevent the collapsing of solids under influence of external forces, such as user interactions and collision. A thorough explanation is given for the origins of instabilities, and extensions that solve the issues are proposed to the physical model. Within the original framework of the classical model a complete restoration of area and volume is introduced. The improved model is suitable for interactive simulation and can recover from volumetric collapsing, in particular upon large deformation.
https://doi.org/10.1142/9789812834461_0022
The proposed work is part of a project that aims for the control of a videogame based on hand gesture recognition. This goal implies the restriction of real-time response and unconstrained environments. In this paper we present a new algorithm to track and recognise hand gestures for interacting with videogames. This algorithm is based on three main steps: hand segmentation, hand tracking and gesture recognition from hand features. For the hand segmentation step we use the colour cue due to the characteristic colour values of human skin, its invariant properties and its computational simplicity. To prevent errors from hand segmentation we add a second step, hand tracking. Tracking is performed assuming a constant velocity model and using a pixel labeling approach. From the tracking process we extract several hand features that are fed to a finite state classifier which identifies the hand configuration. The hand can be classified into one of the four gesture classes or one of the four different movement directions. Finally, the system's performance evaluation results are used to show the usability of the algorithm in a videogame environment.
https://doi.org/10.1142/9789812834461_0023
In this paper a novel approach to the compression of sparse histogram images is proposed. First, we define a sparsity index which gives hints on the relationship between the mathematical concept of matrix sparsity and the visual information of pixel distribution. We use this index to better understand the scope of our approach and its preferred field of applicability, and to evaluate the performance. We present two algorithms which modify one of the coding steps of the JPEG2000 standard for lossless image compression. A theoretical study of the gain referring to the standard is given. Experimental results on well standardized images of the literature confirm the expectations, especially for high sparse images.
https://doi.org/10.1142/9789812834461_0024
This paper describes two innovations that improve the efficiency and effectiveness of a genetic programming approach to object detection problems. The approach uses genetic programming to construct object detection programs that are applied, in a moving window fashion, to the large images to locate the objects of interest. The first innovation is to break the GP search into two phases with the first phase applied to a selected subset of the training data, and a simplified fitness function. The second phase is initialised with the programs from the first phase, and uses the full set of training data with a complete fitness function to construct the final detection programs. The second innovation is to add a program size component to the fitness function. This approach is examined and compared with a neural network approach on three object detection problems of increasing difficulty. The results suggest that the innovations increase both the effectiveness and the efficiency of the genetic programming search, and also that the genetic programming approach outperforms a neural network approach for the most difficult data set in terms of the object detection accuracy.
https://doi.org/10.1142/9789812834461_0025
In this paper we present a system for the reconstruction of 3D models of architectural scenes from single or multiple uncalibrated images. The partial 3D model of a building is recovered from a single image using geometric constraints such as parallelism and orthogonality, which are likely to be found in most architectural scenes. The approximate corner positions of a building are selected interactively by a user and then further refined automatically using Hough transform. The relative depths of the corner points are calculated according to the perspective projection model. Partial 3D models recovered from different viewpoints are registered to a common coordinate system for integration. The 3D model registration process is carried out using modified ICP (iterative closest point) algorithm with the initial parameters provided by geometric constraints of the building. The integrated 3D model is then fitted with piecewise planar surfaces to generate a more geometrically consistent model. The acquired images are finally mapped onto the surface of the reconstructed 3D model to create a photo-realistic model. A working system which allows a user to interactively build a 3D model of an architectural scene from single or multiple images has been proposed and implemented.
https://doi.org/10.1142/9789812834461_0026
A method is proposed for the construction of descent directions for the minimization of energy functionals defined for plane curves. The method is potentially useful in a number of image analysis problems, such as image registration and shape warping, where the standard gradient descent curve evolutions are not always feasible. The descent direction is constructed by taking a weighted average of the three components of the gradient corresponding to translation, rotation, and deformation. Our approach differs from previous work in the field by the use of implicit representation of curves and the notion of normal velocity of a curve evolution. Thus our theory is morphological and well suited for implementation in the level set framework.
https://doi.org/10.1142/9789812834461_0027
In this paper, we propose a partial differential equation based method to segment image objects, which have a given parametric shape based on energy functional. The energy functional is composed of a term that detects object boundaries and a term that constrains the contour to find a shape compatible with the parametric shape. While the shape constraints guiding the PDE may be determined from object's shape statistical models, we demonstrate the proposed approach on the extraction of objects with explicit shape parameterization, such as linear image segments. Several experiments are reported on synthetic and real images to evaluate our approach. We also demonstrate the successful application of the proposed method to the problem of removing camera lens distortion, which can be significant in medium to wide-angle lenses.
https://doi.org/10.1142/9789812834461_0028
In this paper, we discuss common colour models for background subtraction and problems related to their utilisation are discussed. A novel approach to represent chrominance information more suitable for robust background modeling and shadow suppression is proposed. Our method relies on the ability to represent colours in terms of a 3D-polar coordinate system having saturation independent of the brightness function; specifically, we build upon an Improved Hue, Luminance, and Saturation space (IHLS). The additional peculiarity of the approach is that we deal with the problem of unstable hue values at low saturation by modelling the hue-saturation relationship using saturation-weighted hue statistics. The effectiveness of the proposed method is shown in an experimental comparison with approaches based on RGB, Normalised RGB and HSV.
https://doi.org/10.1142/9789812834461_0029
Interactive techniques for extracting the foreground object from an image have been the interest of research in computer vision for a long time. This paper addresses the problem of an efficient, semi-interactive extraction of a foreground object from an image. Snake (also known as Active contour) and GrabCut are two popular techniques, extensively used for this task. Active contour is a deformable contour, which segments the object using boundary discontinuities by minimizing the energy function associated with the contour. GrabCut provides a convenient way to encode color features as segmentation cues to obtain foreground segmentation from local pixel similarities using modified iterated graph-cuts. This paper first presents a comparative study of these two segmentation techniques, and illustrates conditions under which either or both of them fail. We then propose a novel formulation for integrating these two complimentary techniques to obtain an automatic foreground object segmentation. We call our proposed integrated approach as “SnakeCut”, which is based on a probabilistic framework. To validate our approach, we show results both on simulated and natural images.
https://doi.org/10.1142/9789812834461_0030
CCTV surveillance systems have long been promoted as being effective in improving public safety. However due to the amount of cameras installed, many sites have abandoned expensive human monitoring and only record video for forensic purposes. One of the sought-after capabilities of an automated surveillance system is “face in the crowd” recognition, in public spaces such as mass transit centres. Apart from accuracy and robustness to nuisance factors such as pose variations, in such surveillance situations the other important factors are scalability and fast performance. We evaluate recent approaches to the recognition of faces at large pose angles from a gallery of frontal images and propose novel adaptations as well as modifications. We compare and contrast the accuracy, robustness and speed of an Active Appearance Model (AAM) based method (where realistic frontal faces are synthesized from non-frontal probe faces) against bag-of-features methods. We show a novel approach where the performance of the AAM based technique is increased by side-stepping the image synthesis step, also resulting in a considerable speedup. Additionally, we adapt a histogram-based bag-of-features technique to face classification and contrast its properties to a previously proposed direct bag-of-features method. We further show that the two bag-of-features approaches can be considerably sped up, without a loss in classification accuracy, via an approximation of the exponential function. Experiments on the FERET and PIE databases suggest that the bag-of-features techniques generally attain better performance, with significantly lower computational loads. The histogram-based bag-of-features technique is capable of achieving an average recognition accuracy of 89% for pose angles of around 25 degrees. Finally, we provide a discussion on implementation as well as legal challenges surrounding research on automated surveillance.
https://doi.org/10.1142/9789812834461_bmatter
The following sections are included: