![]() |
New Edition: Handbook of Pattern Recognition and Computer Vision (6th Edition)
The book provides an up-to-date and authoritative treatment of pattern recognition and computer vision, with chapters written by leaders in the field. On the basic methods in pattern recognition and computer vision, topics range from statistical pattern recognition to array grammars to projective geometry to skeletonization, and shape and texture measures. Recognition applications include character recognition and document analysis, detection of digital mammograms, remote sensing image fusion, and analysis of functional magnetic resonance imaging data, etc. There are six chapters on current activities in human identification. Other topics include moving object tracking, performance evaluation, content-based video analysis, musical style recognition, number plate recognition, etc.
Contents:https://doi.org/10.1142/9789812775320_fmatter
The following sections are included:
https://doi.org/10.1142/9789812775320_0001
A review is given of the area of statistical pattern recognition: the representation of objects and the design and evaluation of trainable systems for generalization. Traditional as well as more recently studied procedures are reviewed like the classical Bayes classifiers, neural networks, support vector machines, one-class classifiers and combining classifiers. Further we introduce methods for feature reduction and error evaluation. New developments in statistical pattern recognition are briefly discussed.
https://doi.org/10.1142/9789812775320_0002
The success of many real-world applications demonstrates that hidden Markov models (HMMs) are highly effective in one-dimensional pattern recognition problems such as speech recognition. Research is now focussed on extending HMMs to 2-D and possibly 3-D applications which arise in gesture, face, and handwriting recognition. Although the HMM has become a major workhorse of the pattern recognition community, there are few analytical results which can explain its remarkably good pattern recognition performance. There are also only a few theoretical principles for guiding researchers in selecting topologies or understanding how the model parameters contribute to performance. In this chapter, we deal with these issues and use simulated data to evaluate the performance of a number of alternatives to the traditional Baum-Welch algorithm for learning HMM parameters. We then compare the best of these strategies to Baum-Welch on a real hand gesture recognition system in an attempt to develop insights into these fundamental aspects of learning.
https://doi.org/10.1142/9789812775320_0003
Previous expositions of the Minimum Classification Error framework for discriminative training of pattern recognition systems describe the use of a smoothed version of the error count as the criterion function for classifier design, but do not specify the origin and nature of the smoothing used. In this chapter we show that the same optimization criterion can be derived from the classic Parzen window approach to smoothing in the context of non-parametric density estimation. The density estimated is not that of the category pattern distributions - as performed in conventional non-discriminative methods such as maximum likelihood estimation - but rather that of a transformational variable comparing correct and best incorrect categories. The density estimate can easily be integrated over the domain corresponding to classification mistakes, yielding a cost function that is closely related to the original MCE cost function. The risk estimate formulation presented here provides a new link, Parzen estimation, between the empirical cost function calculated on a finite training set and the true theoretical classification risk. The classic Parzen concepts of linking the kernel width to the amount of training data can be used in the context of discriminative training to express the intuitive concept of using a smaller margin as the amount of training data increases.
https://doi.org/10.1142/9789812775320_0004
A simple and effective syntactic method of description of picture patterns of rectangular arrays of symbols is proposed. The method is based on contextual grammars of Marcus(1969) guided by trajectories in the generation process. The resulting grammar is called Parallel contextual array grammar with contexts shuffled on trajectories. This grammar is capable of describing pictorial patterns that cannot be handled by certain other contextual array grammars introduced in the literature.
https://doi.org/10.1142/9789812775320_0005
Local invariant features have shown to be very successful for recognition. They are robust to occlusion and clutter, distinctive as well as invariant to image transformations. In this chapter recent progress on local invariant features is summarized. It is explained how to extract scale and affine-invariant regions and how to obtain discriminant descriptors for these regions. It is then demonstrated that combining local features with pattern classification techniques allows for texture and category-level object recognition in the presence of varying viewpoints and background clutter.
https://doi.org/10.1142/9789812775320_0006
The development of image interpretation systems is concerned with tricky problems such as a limited number of observations, environmental influence, and noise. Recent systems lack robustness, accuracy, and flexibility. The introduction of case-based reasoning (CBR) strategies can help to overcome these drawbacks. The special type of information (i.e. images) and the problems mentioned above provide special requirements for CBR strategies. In this chapter we review what has been achieved so far and research topics concerned with case-based image analysis and interpretation.
https://doi.org/10.1142/9789812775320_0007
A central problem in computer vision is to reconstruct a 3D scene given multiple images of it. In this chapter, we describe a general geometric framework based on projective geometry. Prom a theoretical point of view, it leads to a simpler and more unified description of the problem than the traditional Euclidean framework. Prom a practical point of view, it makes it possible to use cameras without knowing beforehand their internal parameters.
After a brief overview of projective geometry and its application to camera modeling, we examine how to compute the epipolar geometry and planar correspondences of a pair of uncalibrated views from point matches (Fundamental matrix and homographies). We then move to the problem of computing the 3D positions of the points that generate these matches, as well as the cameras (stratification of reconstruction in projective, affine and Euclidean stages). We eventually tackle the problem of computing “on the fly” the internal calibration of a camera from a sequence of ordinary images (self-calibration).
https://doi.org/10.1142/9789812775320_0008
Skeletonization is a way to reduce dimensionality of digital objects. Here, we present in detail an algorithm that computes the curve skeleton of a solid object, i.e., an object without cavities, in a 3D binary image. The algorithm consists of three main steps. During the first step, the surface skeleton is detected, by directly marking in the distance transform of the object the voxels that should be assigned to the surface skeleton. The curve skeleton is then computed by iteratively thinning the surface skeleton, during the second step. Finally, the third step is performed to reduce the curve skeleton to unit width and to prune, in a controlled manner, some of its peripheral branches.
https://doi.org/10.1142/9789812775320_0009
Digital distance transforms have been used in image processing and analysis since the 1960s. Distance transforms are excellent tools for all applications regarding shape. They are, in fact, extensively used, especially in industrial and medical applications. At the same time, from the mid 1980s until today, there has been a rich literature that investigates distance transforms theoretically, constructs new ones, and improves computation algorithms. Despite this, distance transforms have not really been incorporated into the general image analysis toolbox. They are usually not mentioned at all – or the oldest ones (e.g., City block and Chessboard) are mentioned very briefly – in the basic books on image analysis used in education. One reason for the under-use of distance transforms could be that the oldest distance transforms are very rotation dependent, giving quite different results depending of the position of an object. The Euclidean distance transform is rotation independent up to digitisation effects, but often leads to complex algorithms where it is used. The compromise is the integer weighted distance transforms, that combines the simplicity of the old distance transforms with a reasonable rotation independence. Here, a large number of distance transforms will be described, with some of their properties and the simplest computation algorithms.
https://doi.org/10.1142/9789812775320_0010
Global shape measures are a convenient way to describe regions. They are generally simple and efficient to extract, and provide an easy means for high level tasks such as classification as well as helping direct low-level computer vision processes such as segmentation. In this chapter a large selection of global shape measures (some from the standard literature as well as other newer methods) are described and demonstrated.
https://doi.org/10.1142/9789812775320_0011
The LBP operator is a theoretically simple yet very powerful method of analyzing textures. Through its recent extensions, it has been made into a really powerful measure of image texture, showing excellent results in terms of accuracy and computational complexity in many empirical studies. The LBP operator can be seen as a unifying approach to the traditionally divergent statistical and structural models of texture analysis. Texture is described in terms of micro-primitives (textons) and their statistical placement rules. Optionally, the primitives may be coupled with a complementary measure of local image contrast, which measures the strength of the primitives.
In this chapter the relation of the LBP operator to other texture analysis methods is explained. The chapter shows how the LBP combines aspects of statistical and structural texture analysis, and why it is called a “unifying approach”. The theoretical foundation of the operator is explained starting from a definition of texture in a local neighborhood. A number of extensions to the basic operator are also introduced. The extensions include three different multi-scale models, an opponent color operator and a rotation invariant version. Finally, a number of successful applications of the operator are highlighted.
https://doi.org/10.1142/9789812775320_0012
The basic concepts and underlying techniques of document processing are presented in this chapter. A basic model for document processing is described. In this model, document processing can be divided into two phases: document analysis and document understanding. A document has two structures: geometric (layout) structure and logical structure. Extraction of the geometric structure from a document refers to document analysis; mapping the geometric structure into the logical structure deals with document understanding. Both types of document structures and the two areas of document processing are discussed in this chapter. Two categories of methods have been used in document analysis, namely, (1) hierarchical methods including top-down and bottom-up approaches, (2) no-hierarchical methods including modified fractal signature and order stochastic filtering. Tree transform, formatting knowledge and description language approaches have been used in document understanding. All the above approaches are presented in this chapter. A particular case - form document processing is discussed. Form description and form registration approaches are presented. A form processing system is also introduced. Finally, many techniques, such as skew detection, Hough transform, Gabor filters, projection, crossing counts, form definition language, wavelet transform, etc. which have been used in these approaches are discussed in this chapter.
https://doi.org/10.1142/9789812775320_0013
Chinese character recognition is a very difficult and important pattern recognition problem, as well as a computer vision problem. It has enjoyed great success in recent decade. This chapter introduces the key technology progress of Chinese character recognition and its related research topics and the future prospect for Chinese character recognition.
https://doi.org/10.1142/9789812775320_0014
We present an efficient method of extracting words from handwritten legal amounts on bank cheques based on the spatial gaps between connected components. As typical existing distance measures for gap estimation suffer from underestimation and overestimation errors, a modified version of each distance measure is explored. Moreover, an efficient method of combining three different types of distance measures based on 4-class clustering technique is proposed to compensate the errors in each measure. Lastly, we introduce a heuristic re-segmentation procedure to deal with the under-segmented errors due to gap irregularity. In experiments on real bank cheque database, the modified distance measures show about 3% of better separation rate than their original counterparts. Also, the combining method shows a much higher performance than those of individual distance measures. In addition, by applying the re-separating procedure, an 81.7% of notable correct separation rate was finally achieved.
https://doi.org/10.1142/9789812775320_0015
The ways humans and computers read texts printed in different fonts are described from a pattern recognition point of view. The results of several experiments involving OCR technology are presented. The characteristics of some fonts have been analyzed, and factors affecting the recognition rate have been identified. Suggestions are made on the desirable features of print-fonts which could enhance human reading.
https://doi.org/10.1142/9789812775320_0016
In this chapter we provide a summary of our previous work concerning the application of traditional machine learning techniques to data represented by graphs. We show how the k-means clustering algorithm and the k-nearest neighbors classification algorithm can easily and intuitively be extended from dealing with vector representations to graph representations. We present some of our experimental results, which confirm that the addition of structural information, not present in vector representations, improves both clustering and classification performance when dealing with web documents.
https://doi.org/10.1142/9789812775320_0017
Breast cancer continues to be a significant public health problem in the world. Early detection is the key for improving breast cancer prognosis. Mammography has been one of the most reliable methods for early detection of breast carcinomas. However, the estimated sensitivity of radiologists in breast cancer screening is only about 75%, and the performance would be improved if they were prompted with the possible locations of abnormalities. Breast cancer Computer Aid Diagnosis (CAD) systems can provide such help and they are important and necessary for breast cancer control. Microcalcifications and masses are the two most important indicators of malignancy, and their automated detection is very valuable for early breast cancer diagnosis. Automated detection and classification of masses is even more challenging than of microcalcification. This chapter discusses the methods for mass detection and classification, and microcalcification detection and compares their advantages and drawbacks.
https://doi.org/10.1142/9789812775320_0018
In this chapter, a 2-dimensional (2D) dynamic system of images is formulated in the scale space based on the 2D wavelet transform. The original image is assigned to the highest index (the finest scale) and the decomposed approximations at successively coarser scales (lower indices) are interpreted as the state variables at the corresponding scales. The state transition takes place from a coarse scale to a fine scale. Through the use of the Kalman filtering, the optimal estimation of the finest scale original image from a set of multiscale noisy measurements can be obtained in a scale recursive form. This can be applied to multiresolution image fusion. An efficient filtering algorithm has been developed by using the orthogonal wavelet packet transform that greatly reduces its computational complexity. This methodology is illustrated by experiments including one performed on the fusion of a Landsat TM band 1 image and a SPOT image.
https://doi.org/10.1142/9789812775320_0019
We present two examples that show how fusing data from hyperspectral imaging (HSI) sensors with data from other sensors can enhance overall detection and classification performance. The first example involves fusing HSI data with foliage-penetration synthetic aperture radar (FOPEN SAR) data; the second example involves fusing HSI data with high-resolution panchromatic imaging (HRI) data. The fusion of HSI and SAR data exploits different phenomenology from the two different sensors. The fusion of HSI and HRI combines their superior respective spectral and spatial information. Fusion of HSI and SAR data is accomplished at the feature level. HSI data provide background characterization and material identification; HSI-SAR fusion allows us to reduce false detections and confirm target detection in the SAR image. Fusion of HSI and HRI data is implemented at both data and feature levels, resulting in a combined spatial-spectral analysis that enhances target identification.
https://doi.org/10.1142/9789812775320_0020
Independent component analysis (ICA) has recently demonstrated considerable promise in characterizing fMRI data, primarily due to its intuitive nature and ability for flexible characterization of brain function. As typically applied, spatial brain networks are assumed to be systematically non-overlapping. Often temporal coherence of brain networks is also assumed, although convolutive and other models can be utilized to relax this assumption. ICA has been successfully utilized in a number of exciting fMRI applications including the identification of various signal-types (e.g. task and transiently task-related, and physiology-related signals) in the spatial or temporal domain, the analysis of multi-subject fMRI data, the incorporation of a priori information, and for the analysis of complex-valued fMRI data (which has proved challenging for standard approaches). In this chapter, we 1) introduce ICA and review current algorithms, 2) relate ICA to several well-known pattern recognition techniques, 3) introduce fMRI data and its properties, 4) review the basic motivation for using ICA on fMRI data, and 5) review the current work on ICA of fMRI and the incorporation of prior information.
https://doi.org/10.1142/9789812775320_0021
Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. Despite important advances, one necessary ingredient for natural interaction is still missing-emotions. Emotions play an important role in human-to-human communication and interaction, allowing people to express themselves beyond the verbal domain. The ability to understand human emotions is desirable for the computer in several applications. This chapter explores new ways of human-computer interaction that enable the computer to be more aware of the user's emotional and attentional expressions. We present the basic research in the field and the recent advances into the emotion recognition from facial, voice, and pshysiological signals, where the different modalities are treated independently. We then describe the challenging problem of multimodal emotion recognition and we advocate the use of probabilistic graphical models when fusing the different modalities. We also discuss the difficult issues of obtaining reliable affective data, obtaining ground truth for emotion recognition, and the use of unlabeled data.
https://doi.org/10.1142/9789812775320_0022
Human gait is a spatio-temporal phenomenon that characterizes the motion characteristics of an individual. It is possible to detect and measure gait even in low-resolution video. In this chapter, we discuss algorithms for identifying people by their gait from a monocular video sequence. Human identification using gait, similar to text-based speaker identification, involves different individuals performing the same task and a template-matching approach is suitable for such problems. In situations where the amount of training data is limited, we demonstrate the utility of a simple width feature for gait recognition. By virtue of their deterministic nature, template matching methods have limited noise resilience. In order to deal with noise we introduce a systematic approach to gait recognition by building representations for the structural and dynamic components of gait using exemplars and hidden Markov models (HMMs). The above methods assume that an exact side-view of the subject is available in the probe sequence. For the case when the person walks at an arbitrary angle far away from the camera we present a view invariant gait recognition algorithm which is based on synthesizing a side view of a person from an arbitrary monocular view.
https://doi.org/10.1142/9789812775320_0023
In this chapter, we present a novel biometric authentication system to identify a person's identity by his/her palmprint. In contrast to existing palmprint systems for criminal applications, the proposed system targets at the civil applications, which require identifying a person in a large database with high accuracy in real-time. The system is constituted by four major components: User Interface Module, Acquisition Module, Recognition Module and External Module. More than 7,000 palmprint images have been collected to test the performance of the system. The system can identify 400 palms with a low false acceptance rate, 0.02%, and a high genuine acceptance rate, 98.83%. For verification, the system can operate at a false acceptance rate, 0.017% and a false rejection rate, 0.86%. The execution time for the whole process including image collection, preprocessing, feature extraction and matching is less than 1 second.
https://doi.org/10.1142/9789812775320_0024
In this chapter we provide a summary of our previous works concerning the reconstruction of high-resolution facial images for visual surveillance. Specifically we present our methods of reconstructing high-resolution facial image from a low-resolution facial image based on example-based learning and iterative error back-projection. In our method, a face is represented by a linear combination of prototypes of shape and texture. With the shape and texture information about the pixels in a given low-resolution facial image, we can estimate optimal coefficients for a linear combination of prototypes of shape and those of texture by solving least square minimization. Then high-resolution facial image can be obtained by using the optimal coefficients for linear combination of the high-resolution prototypes. Moreover iterative error back-projection is applied to improve the result of high-resolution reconstruction.
The encouraging results of our methods show that our high-resolution reconstruction methods can be used to improve the performance of the face recognition by enhancing the resolution of low-resolution facial images captured in visual surveillance systems.
https://doi.org/10.1142/9789812775320_0025
A fundamental question in invariant object recognition is that of representation. This chapter reviews object representations based on deformable feature graphs that describe particular views of an object as a spatial constellation of image features. These representations are particularly useful in situations of high clutter and partial occlusions. We demonstrate the benefits of these representations in three recognition applications: face analysis, hand gesture recognition, and the interpretation of cluttered scenes composed of multiple partly occluded objects. We conclude by discussing current trends and open challenges.
https://doi.org/10.1142/9789812775320_0026
We present a two-step method to speed-up object detection systems in computer vision that use Support Vector Machines (SVMs) as classifiers. In the first step we build a hierarchy of classifiers. On the bottom level a simple and fast linear classifier analyzes the whole image and rejects large parts of the background. On the top level, a slower but more accurate classifier performs the final detection. We propose a new method for automatically building and training a hierarchy of classifiers. In the second step we apply feature reduction to the top level classifier by choosing relevant image features according to a measure derived from statistical learning theory. Experiments with a face detection system show that combining feature reduction with hierarchical classification leads to a speed-up by a factor of 335 with similar classification performance.
https://doi.org/10.1142/9789812775320_0027
This chapter presents methods for tracking moving objects using single or multiple cameras. Objects are classified into three categories: single person, people group or vehicle. For the case of the single camera, the proposed method integrates spatial position, shape and color information to track object blobs. Further, it focuses on establishing a correspondence between objects and templates as the objects come into view. For the case of multiple cameras, we fuse data from individual cameras using an Extended Kalman Filter (EKF) to resolve object occlusion. Our results show that integrating simple features makes the tracking effective and that EKF improves the tracking accuracy when long term or temporary occlusion occurs. Successful results for PETS2001 and other image sequences are presented.
https://doi.org/10.1142/9789812775320_0028
Performance evaluation of computer vision algorithms has attracted much attention in recent years. In this chapter we consider performance evaluation of image segmentation algorithms. Three classes of segmentation tasks are investigated: edge detection, region segmentation, and detection of curvilinear structures. We present a taxonomy of performance evaluation approaches, discuss various performance evaluation methods, and explore several other related issues. Considering the advances achieved so far, it can be expected that continuing efforts in future will make this important research area a mature one. In particular the proposed techniques will find wider acceptance in our community to go along with a sound development of computer vision research.
https://doi.org/10.1142/9789812775320_0029
With the rapid growth of multimedia information, video content analysis becomes crucial when dealing with large amounts of multimedia data, including texts, images, audio, video, and graphics. Video content analysis can be used to facilitate the access to desired content for information retrieval, summarization, and knowledge discovery. The study and development of suitable and effective techniques for video content management and access has been conducted extensively over the last decade. In this chapter, we will first review the characteristics of visual and audio contents and their basic processing techniques. Then, we will discuss semantic approaches to video data understanding and provide an overview of major developments in this field. In particular, we emphasize video knowledge discovery using features of multiple modalities, enhancing collaboration among semantic cues from various kinds of information sources.
https://doi.org/10.1142/9789812775320_0030
Modeling of complex systems should conveniently combine structure and behavior in a single model. Motivated by this observation, Object-Process Methodology (OPM) is a comprehensive, holistic approach to modeling, study, development, engineering, evolution, and lifecycle support of systems. Employing a combination of graphics and a subset of English, the OPM paradigm integrates the object-oriented, process-oriented, and state transition approaches into a single frame of reference. Due to its structure-behavior integration, OPM provides a solid basis for modeling complex systems in general and pattern recognition systems in particular. This chapter provides an overview of OPM, its ontology, semantics, and symbols. It then describes applications of OPM in various domains with focus on image processing and pattern recognition, and concludes with a sample of a generic image processing example model.
https://doi.org/10.1142/9789812775320_0031
In this chapter a study is described in which the possibilities of statistical pattern recognition for musical style recognition are explored. A dataset with compositions of Bach, Handel, Telemann, Mozart and Haydn is investigated. The used featureset consists mainly of features that describe the different sonorities in the compositions. It is shown that with these features it is possible to separate the styles of the various composers. For this a k-nearest neighbor classifier is used which is trained in a featurespace that is spanned by the fisher-discriminants. In the untransformed featurespace, clusters are found with the k-means algorithm. It appears that these clusters reflect the styles of the various composers. The pattern recognition tools are also used to learn something about the characteristics of the styles. For this, decisiontrees are built using the C4.5 algorithm. It turns out that each of the styles represented in the dataset can be described with a few features.
https://doi.org/10.1142/9789812775320_0032
Auto-Detector is a revolutionary mobile Automatic Number Plate Recognition (ANPR) system, installed on board of patrol cars, to continuously detect and read the license plates of all the vehicles found during patrol (both parked and in-motion), without affecting normal operational duties. All recognized license plates are immediately compared against one or more hot-lists of selected number-plates (on-board) and an alarm signal is immediately issued to the patrol crew in case of a match found. The system is already in use by most important Italian security forces (Carabinieri Army and Road Police) for Land Security and traffic monitoring. International applications are in progress (US, UK, Chile, Switzerland, Spain, etc). Experimental results from current installations are referred.
https://doi.org/10.1142/9789812775320_0033
Omnidirectional vision sensors, which have been proposed in 1970, are recently studied in computer vision and multimedia research. This chapter discusses features of previously developed omnidirectional vision sensors and their applications. Omnidirectional vision is one of the key issues in advanced computer vision techniques and enables us to develop various applications of computer vision.
https://doi.org/10.1142/9789812775320_bmatter
The following sections are included:
Chi Hau Chen received his PhD in electrical engineering from Purdue University in 1965, MSEE degree from University of Tennessee, Knoxville in 1962 and BSEE degree from National Taiwan University in 1959. He is currently Chancellor Professor and Professor Emeriti of electrical and computer engineering, at the University of Massachusetts Dartmouth, where he has taught since 1968. His research areas are in statistical pattern recognition and signal/image processing with applications to remote sensing, geophysical, underwater acoustics & nondestructive testing problems; as well computer vision for video surveillance; time series analysis; and neural networks.
Dr. Chen has published (edited and authored) 30 books in his areas of research, including a number of books published by World Scientific Publishing. He was Associate Editor for International Journal of Pattern Recognition and Artificial Intelligence from 1986–2008. Since 2008 he has been an Editorial Board Member of Pattern Recognition Journal. Currently he is also the Series Editor in Computer Vision for World Scientific Publishing.
Dr. Chen has been a Fellow of the Institute of Electrical and Electronic Engineers (IEEE) since 1988, a Life Fellow of the IEEE since 2003, Fellow of the International Association of Pattern Recognition (IAPR) since 1996, full member of the Academia NDT International since 2000 and on the Fulbright Specialist Program since 2008.