New Edition: Handbook of Pattern Recognition and Computer Vision (6th Edition)
Both pattern recognition and computer vision have experienced rapid progress in the last twenty-five years. This book provides the latest advances on pattern recognition and computer vision along with their many applications. It features articles written by renowned leaders in the field while topics are presented in readable form to a wide range of readers. The book is divided into five parts: basic methods in pattern recognition, basic methods in computer vision and image processing, recognition applications, life science and human identification, and systems and technology. There are eight new chapters on the latest developments in life sciences using pattern recognition as well as two new chapters on pattern recognition in remote sensing.
Sample Chapter(s)
Chapter 1: A Unification of Component Analysismethods (213 KB)
https://doi.org/10.1142/9789814273398_fmatter
The following sections are included:
https://doi.org/10.1142/9789814273398_0001
Over the last century Component Analysis (CA) methods such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Canonical Correlation Analysis (CCA), k-means and Spectral Clustering (SC) have been extensively used as a feature extraction step for modeling, classification, visualization and clustering. CA techniques are appealing because many can be formulated as eigen-problems, offering great potential for learning linear and non-linear representations of data without local minima. However, the eigen-formulation often conceals important analytic and computational drawbacks of CA techniques, such as solving generalized eigen-problems with rank deficient matrices, lacking intuitive interpretation of normalization factors, and understanding relationships between CA methods.
This chapter proposes a unified framework to formulate many CA methods as a least-squares estimation problem. We show how PCA, LDA, CCA, k-means, spectral graph methods and kernel extensions correspond to a particular instance of a least-squares weighted kernel reduced rank regression (LS-WKRRR). The least-squares formulation allows better understanding of normalization factors, provides a clean framework to understand the communalities and differences between many CA methods, yields efficient optimization algorithms for many CA algorithms, suggest easy derivation for on-line learning methods, and provides an easier generalization of CA techniques. In addition, we derive weighted generalizations of PCA, LDA, SC and CCA (including kernel extensions).
https://doi.org/10.1142/9789814273398_0002
As multiple experts can confront and exchange their ideas in order to improve the decision-making process, a pattern recognition system can use several classifiers in order to improve its recognition rate. Moreover, various decisions strategies, implying these classifiers in different ways, can contribute to a same recognition task. A first strategy consists in deciding using different opinions: this is the combination of classifiers. A second strategy consists in using one or more opinions for better guiding other classifiers in their training stages, and/or to improve the decision-making of other classifiers in the classification stage: this is the cooperation of classifiers. The third and last strategy consists in giving more importance to one or more classifiers according to various criteria or situations: this implies the selection of classifiers. Since time could be a selection criterion, the temporal aspect of pattern recognition, i.e. the possible evolution of the classes to be recognized, can be treated by the selection strategy.
https://doi.org/10.1142/9789814273398_0003
Recently, in pattern recognition and related fields an emerging trend of representing objects by graphs can be observed. In fact, graph based object representation offers a versatile alternative to vectorial representations. The domain of graphs, however, contains only little mathematical structure. That is, most of the basic mathematical operations, actually required by many standard pattern recognition algorithms, are not available for graphs. Consequently, we observe a severe lack of algorithmic tools in the domain of graphs. The present chapter aims at introducing a novel approach for graph embedding in vector spaces. The rationale for such an embedding is to bridge the gap between the high representational power and flexibility of graphs and the large amount of algorithms available for object representations in terms of feature vectors. The key-idea of the proposed embedding method is to use the distances of an input graph to a number of training graphs, termed prototypes, as a vectorial description of the graph. In contrast with some other graph embedding methods, there are no restrictions on the type of graphs the proposed embedding framework can deal with. Moreover, the dissimilarity embedding of graphs is particularly able to cope with noisy data. In an experimental evaluation we show that the proposed methodology of first embedding graphs in vector spaces and then applying a statistical classifier has significant potential to outperform classifiers that directly operate in the graph domain.
https://doi.org/10.1142/9789814273398_0004
Training fuzzy ARTMAP neural networks for classification using data from complex real-world environments may lead to category proliferation, and yield poor performance. This problem is known to occur whenever the training set contains noisy and overlapping data. Moreover, when the training set contains identical input patterns that belong to different recognition classes, fuzzy ARTMAP will fail to converge. To circumvent these problems, some alternatives to the network’s original match tracking (MT) process have been proposed in literature, such as using negative MT, and removing MT altogether. In this chapter, the MT parameter of fuzzy ARTMAP is optimized during training using a new Particle Swarm Optimisation (PSO)-based strategy, denoted PSO (MT). The impact on fuzzy ARTMAP performance of training with different MT strategies is assessed empirically, using different synthetic data sets, and the NIST SD19 handwritten character recognition data set. During computer simulations, fuzzy ARTMAP is trained with the original (positive) match tracking (MT+), with negative match tracking (MT−), without MT algorithm (WMT), and with PSO (MT). Through a comprehensive set of simulations, it has been observed that by training with MT−, fuzzy ARTMAP expends fewer resources than with other MT strategies, but can achieve a significantly higher generalization error, especially for data with overlapping class distributions. In particular, degradation of error in fuzzy ARTMAP performance due to overtraining is more pronounced for MT− than for MT+. Generalization error achieved using WMT is significantly higher than other strategies on data with complex non-linear decision bounds. Furthermore, the number of internal categories required to represent decision boundaries increases significantly. Optimizing the value of the match tracking parameter using PSO (MT) yields the lowest overall generalization error, and requires fewer internal categories than WMT, but generally more categories than MT+ and MT−. However, this strategy requires a large number of training epochs to convergence. Based on this empirical results with PSO (MT), the MT process as such can provide a significant increase to fuzzy ARTMAP performance, assuming that the MT parameter is tuned for the specific application in mind.
https://doi.org/10.1142/9789814273398_0005
In an object recognition search, in which objects are examined one at a time, a bio-inspired shape feature is introduced which emulates the integration of the saccadic eye movement and the horizontal layer in the retina of a vertebrate. An optimal computational model for shape extraction, which is based on Principal Component Analysis (PCA) and which reduces processing time and enables real time adaptive capability, is also developed. A color feature of the object, using color segmentation, is used to aid shape feature recognition to better solve object recognition problems in heterogeneous environments where a single technique, using either shape or color, may not be effective. To enable an effective recognition system, an adaptive architecture with an autonomous mechanism is introduced to recognize and adapt the shape and color features of moving objects. These preliminary results set a corner stone for further study of practical and effective discrete modeling of a 3-D object. The adaptive architecture and mechanism provide a major step toward enabling an optimal and practical bio-inspired visual system for the future. Additional work is planned to incorporate a bipolar layer which addresses the scaling issue.
https://doi.org/10.1142/9789814273398_0006
Probabilistic graphical models (PGMs) have become increasingly popular statistical modeling tools for effectively addressing many real-world problems. Both directed graphical models such as Bayesian Networks (BNs), Dynamic Bayesian Networks (DBNs), and undirected graphical models such as Markov Random Fields (MRFs) and Conditional Random Fields (CRFs) have been widely used to solve computer vision problems. In this chapter, we briefly introduce the basic concepts and theories of different types of PGMs, including BNs, DBNs, MRFs, and CRFs. We further show their applications in two important computer vision problems: facial expression recognition and image segmentation. Specifically, we first illustrate how to employ a DBN to capture the static and dynamic relationships between action units for facial expression recognition. We then introduce a hybrid PGM to model the heterogeneous relationships between different image entities (regions, edges, junctions, etc.) for effective image segmentation. These applications demonstrate the powerful capability of PGMs for flexibly modeling the real-world problems in a rigorous statistical framework and for solving the computer vision problems systematically through a probabilistic inference.
https://doi.org/10.1142/9789814273398_0007
In this chapter, we present a region-tree based framework for general image discrete labeling problems. In particular, the input image is represented as a tree of over-segmented regions and the labeling problem is solved by optimizing an energy function defined on such a region-tree using dynamic programming (DP). By using the over-segmentation based region-tree as image representation, our framework combines the advantages of enabling labeling primitives that contain more information with a large support area compared to that of the pixel based approaches with a small support area and of reducing the risk of propagating segmentation errors into the labeling process compared to the layer based approaches. The region-tree representation further enables the use of fast DP optimization and enhances the performance by avoiding the use of smoothness constraints crossing discontinuities between adjacent labeling primitives. Variants such as coarse-to-fine region-tree and temporal region-tree are also developed for different applications such as binocular stereo matching, optical flow estimation and multi-view space-time consistent video depth recovery. The corresponding experimental evaluations have shown that our proposed framework is very effective and produces promising results.
https://doi.org/10.1142/9789814273398_0008
Background modeling is often used in the context of moving objects detection from static cameras. Numerous methods have been developed over the recent years and the most used are the statistical ones. The purpose of this chapter is to provide a recent survey of these different statistical methods. For this, we have classified them in term of generation following the years of publication and the statistical tools used. We then focus on the first generation methods: Single Gaussian, Mixture of Gaussians, Kernel Density Estimation and Subspace Learning using PCA. These original methods are reminded and then we have classified their different improvements in term of strategies. After analyzing the strategies and identifying their limitations, we conclude with several promising directions for future research.
https://doi.org/10.1142/9789814273398_0009
The following sections are included:
https://doi.org/10.1142/9789814273398_0010
Affine invariant feature extraction is an important topic particularly in image recognition applications. This chapter investigates the area of affine invariant feature construction, and introduces a recently proposed multiscale framework for systematically enhancing the performance of the previous methods. The idea in the approach presented is to extend the given pattern to a set of affine covariant versions, each carrying slightly different information, and then to apply known affine invariants to each of them separately. The key part of the framework is the construction of the affine covariant set, and this is done by combining scaled representations of the original pattern. The application of the multiscale framework is demonstrated by constructing several new affine invariant methods, which are compared to the existing approaches in the experiments.
https://doi.org/10.1142/9789814273398_0011
The dominant plane is the planar area in the robot workspace corresponding to the largest part of an image or at least half of an image. We present an algorithm for detecting the dominant plane from viewpoint of computational cognitive neuroscience using Independent Component Analysis. Then, using geometrical properties of the dominant plane, we derive a spatial reasoning method for mobile robot navigation. While traditional methods for mobile robot navigation tend to detect obstacles in the environment using various sensors, our method detects the dominant plane as a feasible region for navigation.
https://doi.org/10.1142/9789814273398_0012
Perception research provides strong evidence in favor of part based representation of shapes in human visual system. Despite considerable differences among different theories in terms of how part boundaries are found, there is substantial agreement on that the process depends on many local and global geometric factors. This poses an important challenge from the computational point of view. In the first part of the chapter, I present a novel decomposition method by taking both local and global interactions within the shape domain into account. At the top of the partitioning hierarchy, the shape gets split into two parts capturing, respectively, the gross structure and the peripheral structure. The gross structure may be conceived as the least deformable part of the shape which remains stable under visual transformations. The peripheral structure includes limbs, protrusions, and boundary texture. Such a separation is in accord with the behavior of the artists who start with a gross shape and enrich it with details. The method is particularly interesting from the computational point of view as it does not resort to any geometric notions (e.g. curvature, convexity) explicitly. In the second part of the chapter, I relate the new method to PDE based shape representation schemes.
https://doi.org/10.1142/9789814273398_0013
Multispectral and synthetic aperture radar (SAR) images are known to exhibit complementary properties: unlike optical sensors, SAR provides information about soil roughness and moisture and acquires useful data despite clouds and sun-illumination conditions. However, the analysis of the resulting images is more difficult relative to the use of optical imagery due to the noise-like speckle phenomenon. In order to exploit this complementarity for classification purposes, the challenge lies in the definition of accurate joint optical-SAR statistical models due to the different physical natures of these data types and to the corresponding differences in the related parametric models.
In this chapter, the problem of multisensor optical-SAR data classification is addressed and a novel region-based semiparametric classification technique is proposed. The method combines the tree-structured Markov random field approach to segmentation with the dependence tree approach to probability density estimation and with case-specific bivariate models for the distributions of optical and SAR data. A Bayesian decision rule is formulated at the segment level in order to incorporate spatial-contextual information and to gain robustness against noise. Experimental results are presented on a real data set that includes optical multispectral bands and SAR multipolarization and multifrequency channels.
https://doi.org/10.1142/9789814273398_0014
This chapter presents an extensive and critical review on the use of kernel methods and in particular of support vector machines (SVMs) in the classification of remote-sensing (RS) data. The chapter recalls the mathematical formulation and the main theoretical concepts related to SVMs, and discusses the motivations at the basis of the use of SVMs in remote sensing. A review on the main applications of SVMs in classification of remote sensing is given, presenting a literature survey on the use of SVMs for the analysis of different kinds of RS images. In addition, the most recent methodological developments related to SVM-based classification techniques in RS are illustrated by focusing on semisupervised, domain adaptation, and context sensitive approaches. Finally, the most promising research directions on SVM in RS are identified and discussed.
https://doi.org/10.1142/9789814273398_0015
In this chapter, we present a comprehensive review of word spotting techniques and concepts. Word spotting has been adopted and used by various researchers as a complementary technique to Optical Character Recognition for document analysis and retrieval. The various applications of word spotting include document indexing, retrieval and information filtering. Word spotting techniques are based on matching the visual similarity between two images. Unlike OCR techniques, conversion of documents into machine readable codes and machine recognition is not required in word spotting techniques. Proper estimation of bounding boxes, selection and use of proper features as well as robust image matching methods are considered to be the most important aspects of a word spotting system. Here we include the important aspects of word spotting techniques, such as pre-processing, features, matching algorithms and evaluation methods.
https://doi.org/10.1142/9789814273398_0016
In pattern recognition, optimizing the parameters in rejection can achieve a high reliability while maintaining a high recognition rate. When rejection is considered to be a two-class problem of accepting the classification result or not, Linear Discriminant Analysis (LDA), which is an effective classification method, can be used to optimize the criterion for rejection. In this work, a Linear Discriminant Analysis Measurement (LDAM) is designed to take into consideration the confidence values of the classifier outputs and the relations between them. In the process, the First Rank Measurement (FRM), First Two Ranks Measurement (FTRM), and other traditional rejection measurements are reviewed. The use of FRM is an absolute rejection strategy that considers only the top choice for rejection, while using FTRM is a relative rejection strategy that considers the top two choices. Experiments are conducted on the CENPARMI Arabic Isolated Numerals Database with outputs that can represent distances or probabilities from different classifiers such as HeroSVM and LibSVM. The results show that the use of LDAM is more optimal than FRM and FTRM in producing reliable recognition results.
https://doi.org/10.1142/9789814273398_0017
In this chapter we describe various methods for the automatic recognition of handwritten whiteboard notes. A handwriting recognition system for Roman Script is usually divided into units which iteratively process the handwritten input data to finally obtain the desired ASCII transcription: the preprocessing, where noise in the raw data is reduced; the normalization, where various steps take place to remove writer-specific characteristics of the handwriting; the feature extraction, where the normalized data is transformed into a sequence of feature vectors; the recognition, where a classifier generates a list of word sequence candidates; and the post-processing, where language information is used to improve the results. We review different approaches for all of these stages and describe selected approaches in more detail. Furthermore, we introduce some preprocessing steps which have been developed especially for whiteboard notes. In order to assess the advantages of different methods, we present the results of a broad experimental analysis on a large database of handwritten whiteboard notes.
https://doi.org/10.1142/9789814273398_0018
In this chapter, we investigate the potential of physiological signals as reliable channels for emotion recognition. All essential stages of an automatic recognition system are discussed, from the recording of a physiological dataset to a feature-based multiclass classification. In order to collect a physiological dataset from multiple subjects, we developed a musical induction method, without any deliberate lab setting. Four-channel biosensors were used to measure electromyogram, electrocardiogram, skin conductivity, and respiration changes. A wide range of physiological features from various analysis domains is proposed to find the best emotion-relevant features and correlate them with emotional states. The best features extracted are specified in detail and their effectiveness is proven by classification results. Classification of four musical emotions (positive/high arousal, negative/high arousal, negative/low arousal, positive/low arousal) is performed by using an extended linear discriminant analysis (pLDA). Furthermore, by exploiting a dichotomic property of the 2D emotion model, we develop a novel scheme of emotion-specific multilevel dichotomous classification (EMDC) and compare its performance with direct multiclass classification using the pLDA.
https://doi.org/10.1142/9789814273398_0019
The techniques of automated detection and classification of solar image features based on image enhancement, segmentation, pattern recognition, and mathematical morphology methods are presented in this chapter. The solar features include flares and filament. These methods have been applied to full-disk Hα (Hydrogen-Alpha) mages successfully. The solar H-α images were obtained from the Big Bear Solar Observatory in California, USA. The methods presented can lead to real-time solar features detection and classification using advanced image processing and pattern recognition techniques.
https://doi.org/10.1142/9789814273398_0020
Biometric template security is one of the most important issues in developing a practical biometric system. In this chapter, we discuss the threats to biometric authentication systems and give a review on existing approaches for protecting biometric templates. The schemes are classified into three categories, namely biometric cryptosystem approach, transform-based approach and the hybrid approach. Pros and cons of each approach are also discussed.
https://doi.org/10.1142/9789814273398_0021
We present a bottom-up approach to continuous American sign language recognition without wearable aids, but with simple low-level processes operating on images and building realistic representations that are fed into intermediate level processes, to form sign hypotheses. At the intermediate level, we construct representations for both manual and non-manual aspects, such as hand movements, facial expressions and head nods. The manual aspects are represented using Relational Distributions that capture the statistical distribution of the relationships among the low-level primitives from the body parts. These relational distributions, which can be constructed without the need for part level tracking, are efficiently represented as points in the Space of Probability Functions (SoPF). Manual dynamics are thus represented as tracks in this space. The dynamics of facial expressions along with a sign are also represented as tracks, but in the expression subspace, constructed using principal component analysis (PCA). Head motions are represented as 2D image tracks. The integration of manual with non-manual information is sequential, with non-manual information refining the manual information based hypotheses set. We show that with just image-based manual information, the correct detection rate is around 88%. However, with the use of facial information, accuracy increases to 92%. Thus face contributes valuable information towards ASL recognition. ‘Negation’ in sentences is correctly detected in 90% of the cases using just 2D head motion information.
https://doi.org/10.1142/9789814273398_0022
Intravascular Ultrasound (IVUS) has been established as a useful tool for diagnosis of coronary heart disease (CHD). Recent developments have opened the possibility of using IVUS to create a 3D map from which preventative prediction of CHD can be attempted. Segmentation of IVUS images is an important step in this process. However reliable automated segmentation has been elusive, in part because of the variety of image features that are invariably present in the image that distract from the main segmentation objectives. Active contour models (ACM)s have been used successfully for automated segmentation of IVUS images. However, the accuracy of the segmentation is still not adequate for clinical use. Here we describe a new approach of a constrained deformable template model (DTM) that improves on the standard ACM algorithm by (1) detecting other distracting image features (2) using tracking algorithms to get a better estimate of the positions of these features (3) including the knowledge of these positions to eliminate distortions in the ACM due to these features. In addition, semantic constraints are inbuilt into the DTM so that computational time is not wasted in improbable segmentation results. Our results show that this is a promising approach to achieving fully automated segmentation with accuracy comparable to manual segmentation.
https://doi.org/10.1142/9789814273398_0023
Medical images are an important area of application for image segmentation methods. However, despite considerable research into automatic segmentation methods, many difficult medical images are still processed using manual or manually-assisted methods. This is because automatic methods still do not yield reliable results in many cases and often require tweaking parameters to achieve satisfactory performance. This chapter describes a new technique combining manual and automatic techniques to approach this problem. In our method, users start the segmentation by drawing a rough sketch around the area of interest. Then this sketch is automatically adapted to the desired contour. Our technique allows users to refine the desired solution iteratively using simple strokes in a localized manner. In this way it is possible to correct imperfections and overcome algorithmic limitations while taking advantage of the expeditious convergence afforded by conventional image processing methods. Our experiments show that users are able to produce good quality results in an expedite fashion using few, simple and easy interactions. By doing away with the need for parameter tweaking users can segment images which would otherwise be difficult to process via “purely automatic” methods in a simple and controlled manner.
https://doi.org/10.1142/9789814273398_0024
Recent advances in fluorescence microscopy provide us with an unprecedented powerful way to observe many biological objects (pathogens, vesicles, single molecules, etc.) and events (intracellular trafficking, adhesion, migration, pathogens entry) in the living cell under multiple experimental conditions. To understand the underlying biological phenomena, modern microscopy imaging confront researchers to the challenges of extracting and analyzing biological-relevant information from huge amounts of data. In this chapter, we focus on recent advances on automated image analysis frameworks to quantify specific cellular processes. Our aim is to demonstrate that systematic application of such frame-works on thousands of images produce statistically relevant measures of several interesting biological phenomena at the cell population level. Methodologically, we shall emphasize on a recently published robust cell segmentation approach using coupled shape-constrained active contours, and a statistical spot extraction method based on a multiscale variance stabilizing transform. We finally present an example of such approaches to the quantification of cellular endocytosis that demonstrates its usefulness to screening applications in biology.
https://doi.org/10.1142/9789814273398_0025
cDNA microarrays, one of the most fundamental and powerful biotechnology tools, is being utilized in a variety of biomedical applications as it enables scientists to simultaneously analyze the expression levels of thousands of genes over different samples. One of the most essential processes of cDNA microarray experiments is the image analysis one, which is divided into three phases, namely, gridding, spot-segmentation and intensity extraction. Although its two former phases appear relatively straightforward, they are in fact rather challenging procedures due to the nature of microarray images. For their implementation, the currently available software programs require human intervention, which significantly affects the biological conclusions reached during microarray experiments. In this chapter, the basic process of analyzing a microarray image is described and advanced evolutionary algorithms implementing the automatic gridding and segmentation processes are presented. In reality, both of these algorithms are based on optimization problems which are solved by using evolutionary genetic algorithms. Contrary to existing software systems, the proposed methods are fully automatic as they do not require any human intervention; they are also noise resistant and yield excellent results even under adverse conditions. Last but not least, they outperform other software programs as well as established techniques.
https://doi.org/10.1142/9789814273398_0026
New approaches based on the implementation of support vector machine (SVM) and flexible neural trees (FNT) with the error correcting output codes (ECOC) are presented for recognition of multi-class protein folds. ECOC is used for reducing multi-class classification problem to multiple binary classification problems. The SVM and FNT trained by particle swarm optimization algorithm are then employed as the base classifier. The experimental results show that the proposed method can improve prediction accuracy by 4%-15% on two datasets containing 27 SCOP folds.
https://doi.org/10.1142/9789814273398_0027
3D facial reconstruction systems create 3D facial computer models of individuals from their 2D photographic images or video sequences. Currently, published face recognition systems are mostly based on large training sets of 2D facial images although there has been an increase in interest in using 3D data input instead. An intermediate approach is to synthesize a 3D face making use of a single 2D image. In this paper, we present such a method that does not require complicated optimization steps or user-defined parameters, which distinguishes it from existing 3D face reconstruction methods. The method has been used in 2D face recognition experiments to generate a 3D Morphological Model (3DMM) from a single facial training image.
Given a single 2D facial image, a small set of morphological feature points are selected. Their corresponding three-dimensional indices in the morphological model are obtained using the so-called Fast Marching Method. Shape alignment between the 2D input image and the 3D morphological model is achieved using Newton's Method to solve a single nonlinear equation to obtain a scalar parameter. This yields all of the shape parameters of the 3DMM. Texture recovery for the model involves bridging the 2D image representation of the 3DMM using the so-called UV space as an intermediary. Thus the advantage of the proposed method over others in the literature is that a computationally complex geometric problem is simplified and transformed to a 2D-to-3D warping problem. This enhances both the efficiency and accuracy of texture recovery.
We also introduce a quantitative shape measure based on the Bending Invariant Canonical Form to determine the quality of the 3D reconstruction. This is shown to produce a reconstruction error of 5% over the database of 100 stored 3D faces. In addition, the reconstructed 3D faces at various head poses have been used to create 2D images of faces that form the training set for face recognition experiments reported elsewhere.
https://doi.org/10.1142/9789814273398_0028
Annotation of biological images is to label biological images or parts of the images, often using ontological or anatomical vocabularies. The task has become increasingly important along with the availability of large data sets of multidimensional and multiscale biological images in recent years. Examples include the annotation of the anatomical regions of gene expression patterns of Drosophila melanogaster (fruitfly), or the annotation of cellular or subcellular structures for other model organisms at different developmental stages. To automate the process in order to increase efficiency and consistency, the annotation problem can be formulated as a pattern recognition task with ontological or anatomical terms viewed as the targets to be recognized in the image. The task presents various objectives: we may annotate the entire image or many Regions-Of-Interest (ROI) in an image; the problem may be mutual-exclusive (when the image or region corresponds to one target) or multi-objective (when several annotation targets co-exist in an image). The task also has unique challenges such as skewed data distribution, morphological variety and big image size. In this chapter, we discuss algorithms and applications for automated annotation of biological images. We detail the algorithms for in situ fruitfly gene expression patterns during embryogenesis which is a multi-objective annotation problem. We also cover mutual-exclusive and ROI annotation applications. We demonstrate that extracting and selecting a concise set of good image features are essential for such applications and show effectiveness of our proposed algorithms.
https://doi.org/10.1142/9789814273398_0029
Over the past decades, Computed Tomography (CT) theories, technologies, and applications have undergone rapid development and deployment. Although CT was developed as an imaging technique primarily for medical diagnosis, its applications in areas outside the medical field have gained increasing attention. In particular, CT combined with pattern recognition techniques has been used in Explosives Detection Systems (EDS) for Automated Threat Detection (ATD) in checked baggage screening. In this chapter, we describe dual energy x-ray CT technologies and associated ATD techniques used for checked baggage screening at airports.
https://doi.org/10.1142/9789814273398_0030
Obesity is one of the most rapidly evolving public health problems in many parts of the world. Currently, self-reporting is the main method for diet and physical activity assessment. The lack of objective measurement technology has significantly hampered obesity research and clinical treatment of patients. We present an image-based computational approach to monitor both dietary and physical activity continuously for extended periods. Our goal is to produce quantitative measurements of energy balance for overweight and obese individuals. A new wearable device has been designed to be worn below the neck. This device contains a miniature camera which records video from a viewpoint similar to that of the wearer and sensors to estimate food portion size and other variables. The image data, along with other sensor measurements, are compressed and stored in a multi-gigabyte flash memory. The recorded data are uploaded to a computer where advanced multimedia data processing algorithms, including food portion size and physical activity estimations, are utilized to compute calories and nutrients in foods and energy expenditure during physical activity. This multimedia approach helps people to manage their nutritional and caloric intake and maintain sufficient amounts of physical activity in their particular working and living environments, providing an effective tool for aiding in curbing the progression of the obesity epidemic.
https://doi.org/10.1142/9789814273398_0031
Digital watermarking techniques allow for the insertion of an imperceptible multi-bit payload into digitized multimedia data, such as video, image and audio. In contrast to other information hiding techniques, the payload carries information about the cover work, and apply to enforcement of authenticity and of integrity, media enhancement, etc. One of the key challenges to the practical application of digital watermarking, regards adapting these techniques to variations across different types of images. Early research in the area focused on exploring properties of each specific watermarking technique to improve performance. However, recent trends in research indicate that computational intelligence can be very beneficial for designing adaptive or intelligent watermarking techniques. Some evolutionary computation techniques have been developed to automatically set user-defined parameters of watermarking systems. In addition, machine learning techniques have recently been used to learn processes of these systems. These techniques affect a trade-off between robustness (against attacks), image quality, and system resources. In this chapter these techniques are reviewed and a case study is presented as a demonstration of their effectiveness.
https://doi.org/10.1142/9789814273398_0032
The number of samples that are available on the internet to train pattern classifiers is increasing rapidly, while traditional pattern classification techniques based on a single computer system are powerless to process these large-scale data sets. This chapter presents a parallel and modular pattern classification framework for coping with large-scale pattern classification problems. The proposed framework follows a divide-and-conquer strategy that easily assigns a given large-scale problem to an available parallel and distributed computing infrastructure. The framework consists of three independent parts: decomposing training data sets, training component classifiers in a parallel way, and combining trained component classifiers. In order to evaluate the performance of the proposed framework, we perform experiments on a large-scale Japanese patent classification problem, containing about 3,500,000 patent documents. The experimental results show that our framework has the following attractive features: (a) The framework is general, and therefore any traditional pattern classification techniques such as support vector machines can be easily embedded in the framework as component classifiers. (b) The framework can incorporate explicit domain or prior knowledge into learning through the process of dividing training data sets. (c) The framework has good scalability and is easily implementable in hardware.
https://doi.org/10.1142/9789814273398_0033
Intelligent Video Analysis (also called Video Analytics, VA) represents the key function of new generation Video Surveillance Systems, to support security operators in the management of video information from a large number of cameras displaced in the scene. The chapter is addressing the problem of human detection and tracking in a variety of applications, with the objective to perform all the process of scene analysis directly in world 3D coordinates by using video-camera data mainly for hypothesis validation. This centralized supervision approach (a single brain with many eyes) is supposed to manage an arbitrary number of stationary cameras, depending on the available computational power. A fairly accurate camera calibration process is required to predict the position in time and space of the targets in the scene. A simple 3D model is proved to be sufficient to describe not only rigid objects, but also the articulated flexible shape of humans in the hypothesis of low resolution targets and wide field of view. A few examples of possible applications to video surveillance are referred including standard optics and wide-field panoramic cameras (fish-eye lens).
https://doi.org/10.1142/9789814273398_bmatter
The following sections are included:
Chi Hau Chen received his PhD in electrical engineering from Purdue University in 1965, MSEE degree from University of Tennessee, Knoxville in 1962 and BSEE degree from National Taiwan University in 1959. He is currently Chancellor Professor and Professor Emeriti of electrical and computer engineering, at the University of Massachusetts Dartmouth, where he has taught since 1968. His research areas are in statistical pattern recognition and signal/image processing with applications to remote sensing, geophysical, underwater acoustics & nondestructive testing problems; as well computer vision for video surveillance; time series analysis; and neural networks.
Dr. Chen has published (edited and authored) 30 books in his areas of research, including a number of books published by World Scientific Publishing. He was Associate Editor for International Journal of Pattern Recognition and Artificial Intelligence from 1986–2008. Since 2008 he has been an Editorial Board Member of Pattern Recognition Journal. Currently he is also the Series Editor in Computer Vision for World Scientific Publishing.
Dr. Chen has been a Fellow of the Institute of Electrical and Electronic Engineers (IEEE) since 1988, a Life Fellow of the IEEE since 2003, Fellow of the International Association of Pattern Recognition (IAPR) since 1996, full member of the Academia NDT International since 2000 and on the Fulbright Specialist Program since 2008.
Sample Chapter(s)
Chapter 1: A Unification of Component Analysismethods (213k)