![]() |
This book describes recent strategies and applications for extracting useful information from sensor data. For example, the methods presented by Roth and Levine are becoming widely accepted as the ‘best’ way to segment range images, and the neural network methods for Alpha-numeric character recognition, presented by K Yamada, are believed to be the best yet presented. An applied system to analyze the images of dental imprints presented by J Côté, et al. is one of several examples of image processing systems that have already been proven to be practical, and can serve as a model for the image processing system designer. Important aspects of the automation of processes are presented in a practical way which can provide immediate new capabilities in fields as diverse as biomedical image processing, document processing, industrial automation, understanding human perception, and the defence industries. The book is organized into sections describing Model Driven Feature Extraction, Data Driven Feature Extraction, Neural Networks, Model Building, and Applications.
https://doi.org/10.1142/9789814355841_fmatter
The following sections are included:
https://doi.org/10.1142/9789814355841_0001
Pose determination is the process of finding the pose (position and orientation) of a part with a known geometry in a scene using sensor data. No prior estimate of the pose is assumed to be available. If an estimate of the pose is available then pose determination becomes pose refinement. It is shown that both can be modelled as optimizations of a cost function, which in the worst case has many local minima. This optimization model makes it clear that the most important issue is how to efficiently find the global minimum of the cost function, since this will produce the best pose. The minimal subset principle states that a minimal subset of correspondences between model and sensor data can be used to hypothesize a pose. If enough minimal subsets are chosen randomly then the correct pose is likely to be found. This principle is used to produce a general algorithm for pose determination and refinement which is capable of dealing with many different kinds of part models. Elimination theory makes it possible to create a hypothetical pose from a subset of correspondences whenever the model primitives are described by a polynomial equation. The algorithm can be easily parallelized, and a number of possible approaches to parallelization are discussed.
https://doi.org/10.1142/9789814355841_0002
Application of robust regression methods to computer vision problems becomes increasingly popular. These methods make use of a priori available information through the assumed model from which, however, deviations are allowed. Thus robust methods are more powerful than nonparametric techniques which, in order to avoid erroneous assumptions, do not involve models at all. We describe the least median of squares estimator and show that for computer vision data its straightforward use may not yield the desired results. Most of the artifacts can be avoided by using a somewhat different, consensus based approach. We present the principle behind the consensus paradigm.
https://doi.org/10.1142/9789814355841_0003
Several algorithms are developed for verification of regular-shaped objects by breaking boundaries into fundamental primitives such as lines and circular arcs. After discussing the algorithms for detection of extreme points having high curvatures, an algorithm is proposed for detecting knot-points formed by basic primitive segments. For verification of object structures, the existing flat industrial objects are classified into four categories. For each category, corresponding effective verification algorithms are developed, according to the characteristics of the objects in each class.
https://doi.org/10.1142/9789814355841_0004
This paper presents a criterion for deciding whether or not a set of lattice points is the digital image of a real circle or a circular arc. For this purpose the digital-inscribable quadrilaterals are used, such that it is not necessary to have a priori information about the center and radius of the circle. The set of lattice points is divided into digital-straight line segments. Then it is verified to determine if all three consecutive digital-straight line segments form a digital-inscribable quadrilateral (the fourth edge of the quadrilateral links the endpoints of the contour). This method also deals with the segmentation of the contour lines into digital-straight line segments and digital-circular arcs.
https://doi.org/10.1142/9789814355841_0005
In this paper we present a knowledge-based approach for recognizing objects by generating sequences of vision routines. The system FIGURE generates a sequence of vision operations with respect to a query about a particular object. The resulting computer vision program detects the object, which the query was about, in a grayscale image. We propose a three-stage procedure. First, we use a 2-D representation of real objects modelling them in a model object base. A query to the model object base about an object procures evidence features of the goal object, one or more features. The second stage involves a set of the deduced evidence features and their values forming a generic model of the real object. The generic model serves as a specification for a vision program. Finally, a configuration module comprised of low-level and medium-level vision operations, segmentation algorithms and morphological operations is used to generate a vision program. The configuration process is supported by various rules. These rules are used to set appropriate parameter values for the vision operations and to control how vision operations are combined to form a program.
https://doi.org/10.1142/9789814355841_0006
There has been a tendency among individuals, in a variety of technical fields, to think in either the time domain (the world as seen through an oscilloscope) or the frequency domain (the world as seen through a spectrum analyzer1). Many labs are equipped with modern oscilloscopes and spectrum analyzers, each telling its story about what is actually happening. To the contrary, however, physical phenomena quite often fit neither the particle ("things") nor the wave ("stuff") descriptions. Is light "stuff" or a collection of "things"? A framework for this duality has existed since the eleventh century, where musicians have been using time–frequency representations. Engineers and scientists are now realizing the full utility of time–frequency space.
In this chapter, various Time–Frequency (TF) methods are introduced and extended, within a unified general framework: the chirplet. A brief historical perspective, touching on some of the early ideas (such as the work of Fourier and Gabor) is presented in a new way, leading to a smooth extension into the chirplet domain.
Applications of this new time–frequency paradigm are also discussed.
https://doi.org/10.1142/9789814355841_0007
The use of electromagnetic analogies for perceptual grouping of image primitives is presented. This approach assumes that grouping is a low-level, data-driven and global process, with all image tokens interacting in some way.
Two computational models are introduced, which allow determination of proximity and directionality of image primitives. With the second model, prominent image features are considered as being electrical charges. According to Poisson's equation, they generate a scalar potential and an associated "electrical" vector field. The potential and field, determined by the combined global influence of all image features, are well defined over the entire image. The scalar potential can be used for proximity grouping, while the local direction of the field allows grouping of primitives according to their common directional tendency.
Implementation problems and solutions are presented. Various results are shown and discussed.
https://doi.org/10.1142/9789814355841_0008
Early vision process is the extraction of primitive measurements from the image, and this forms the first stage of computation in many vision systems. This paper presents a new approach to feature detection problem by transforming the problem to a complex space in which features may be more readily and robustly recognized. Aiming at providing a computational way of producing feature maps, we first define mathematically the feature of interest, then study phase, amplitude (local energy) and local frequency behavior of the feature to derive criteria for its detection.
This feature detection technique is applied to edges and bars, which are low level features used as primitives for complex visual tasks. We consider the use of phase, amplitude and local frequency information for edge and bar identification as an alternative to numerical differentiation, which is known to be unstable in the presence of noise. Algorithms for the detection and localization of one dimensional edges and bar-like targets of a particular width are developed and their application to real images indicate the promise of this approach.
https://doi.org/10.1142/9789814355841_0009
This paper presents a new method for roof edge detection. In practice, digital images are noise contaminated. A simple method of roof edge detection is to smooth the noisy image, take its derivative, and then use step edge detection techniques to detect roof edges. However, reducing noise by smoothing will also blur the edges since edges correspond to the high frequencies as noises do. Our method is to classify roof peaks from noise peaks by morphological testing. Roof edges on digital images are found by looking for the leaf nodes of the skeletal tree. A leaf node represents a local maximum (or peak). The size of the subtree containing the leaf node is tested to distinguish the noise from the roof edge. The skeletal trees of gray scale images can be easily found by gray scale morphological skeleton operations.
https://doi.org/10.1142/9789814355841_0010
This paper presents a set of feature frequency matrices(FFM) to reflect four visual properties of a texture image. The visual properties considered are coarseness, contrast, directionality and line-likeness. We shall demonstrate the efficacy of the proposed approach by simulated as well as natural textures.
https://doi.org/10.1142/9789814355841_0011
A new method for pattern recognition using multi-layer neural networks with a rejection function is described. The system has two components: (1) STELA (STandstill Evading Learning Algorithm), a new learning algorithm that extends back propagation by adding a technique to avoid learning standstill states, and (2) the derivation of a rejection function which estimates the uncertainty of the classification, and rejects the pattern as unclassifiable if the network response is too uncertain. The ability of the method to recognize deformed and noisy patterns is demonstrated in the domain of multi-font alpha-numeric character recognition. STELA achieves perfect recognition for the training data. Without using rejection function, the back propagation makes 12 times as many errors on a recognition task using 20 different fonts as STELA does. Furthermore, a conventional rejection method is compared to the one derived here, and found to require twice as many rejections to achieve the same error rate.
https://doi.org/10.1142/9789814355841_0012
In this paper a classification network is presented for the recognition and hierarchical grouping of binary patterns based on an ART 1 neural classifier. The network is first presented with a series of primitive patterns which are memorized. After each presentation of a complex pattern the network automatically selects the classes of the previously learned primitives that match the parts of the image shown. For each primitive selected, the network spatially isolates its counterpart in the image and computes the Hamming distance between the two parts. After recognizing all the parts of the complex image, a grouping of these constituent classes is made to a new class for the complex image. This hierarchical representation is subsequently used for the recognition of partially distorted complex patterns.
The network was trained for recognizing handwritten numerals. After the presentation of a few primitives and only one example of the numbers from 0 to 9, the network was able to identify many distorted and noisy patterns of numbers.
https://doi.org/10.1142/9789814355841_0013
Visual perception has little to do with the retinal projections of the viewed scenes. Objects are perceived as 3-dimensional whereas their retinal images are 2-dimensional. The properties of perspective projection result in an object at different distances projecting to different retinal images; however, the perceived size persists independent of the variations of the viewing distance. Also, perceived objects tend to maintain their shape independent of changes in orientation. It has been suggested that humans as well as other primates must learn to "see." Touch has been considered as the most likely source for educating the visual system, presumably by adding meaning to the initially meaningless plethora of retinal images. In this work, we report on a series of experiments that focus on the task of visuo-tactual coarseness estimation: we examine the relation between the two senses, and give evidence for the occurrence of tactual recalibration in the presence of conflicting stimuli that extend over a relatively short period of time. The experiments are particularly interesting in that they demonstrate the occurence of visual dominance and recalibration in the case of passive tactile perception. A black box model, in agreement with the observed behavior, is also presented.
https://doi.org/10.1142/9789814355841_0014
One common method of 3-D computer vision is stereopsis. A single camera system which has been implemented using this approach is presented here. The MAP (Matching Algorithm for Points) program is able to synthesize relatively accurate models of unknown polyhedral objects (i.e. straight edges and planar faces) of simple to medium complexity. Edge data is first extracted from sequences of digitized images and then processed to extract corner points and remove background noise. Corresponding points are matched between successive images and the resulting information is used to triangulate the (x, y, z) coordinates of the original object vertices. This then permits the synthesis of a wire-frame model depicting the item under consideration. Sample results are presented to demonstrate the capabilities of the system. Possible directions for future development are also briefly discussed.
https://doi.org/10.1142/9789814355841_0015
This paper describes a simple but powerful two-stage technique for inferring volumetric descriptions of articulated objects from parallel planar cross-sections or slices. The number of slices can vary and their density is, in general, unknown. The algorithm identifies the relevant components or parts of an object and fits volumetric primitives to them. The data are first segmented into parts, then sticks and blobs are fitted to the parts. The segmentation is completely model-independent. It exploits regularities like transversality to generate a set of perceptually plausible object subparts which, in principle, could be described by any volumetric primitives. Our volumetric primitives are convex. They are meant to be suggestive more than literal: they describe the object in terms of type, connectivity, position and orientation of its subparts. Attractive features of the approach include its simplicity, its stability with respect to variations of imaging conditions, and the model-independent nature of volumetric segmentation.
https://doi.org/10.1142/9789814355841_0016
The supervised classifier contained in an automatic PC based mitotic index scoring system is presented. An interactive statistical training based on size and circularity is combined with fast cluster identification techniques to produce an accurate count of cells and mitotic clusters. The system is validated comparing the results obtained automatically with those obtained visually in a set of 1000 images.
https://doi.org/10.1142/9789814355841_0017
An automatic analysis of angiographic retinal images is discussed. A detection method based on texture differences between perfused and non-perfused regions is described. First the image is tessellated into a large number of primary regions using the positions of vessels and other structures in the image. The image is represented as a region adjacency graph. Arcs in the graph are classified as vessels using features obtained from grey level profiles in the original image. For each region the degree of non-perfusion is estimated by measuring textural properties using mathematical morphology. Adjacent regions are subsequently merged using information on the degree of perfusion.
https://doi.org/10.1142/9789814355841_0018
The design of, and experimentation with, vision sensing methods in robotic assembly is discussed in the context of a behaviour based approach. This approach leads to an elegant way of incorporating sensing into an assembly system. The method used does not need a coordinate system that is common to the subcomponents of the assembly system, and has minimal representation. It has strong coupling between sensing and action. A complete assembly task was tested to show the robustness of the system.
https://doi.org/10.1142/9789814355841_0019
This paper presents a computer vision system for the acquisition and processing of 3-D images of wax dental imprints recorded on human subjects. The system is dedicated to orthodontic diagnosis and epidemiological studies. The acquisition system is described in detail and experimental results are presented. The accuracy of the system is tested on a reference imprint.
A segmentation algorithm is used for the detection of the interstices between teeth on the 3-D images of the imprint. A multi-operator approach is developed to deal with teeth of different shape and size. Experiments performed on a large number of imprints show that the algorithm for interstice detection and position estimation is very reliable and accurate.
https://doi.org/10.1142/9789814355841_0020
We present a simulation-based model of sonar range sensing for robot navigation that accounts for multiple reflections of the sonar signal between transmission and reception. This gives more realistic results than previous models. The approach is based on simulation of the reflection and diffraction of sonar rays from reflecting surfaces until they are attenuated beyond detectability or return to the receiver. Parameters of the model include frequency, minimum and maximum range, and signal detection threshold (relative to emitted signal strength, after linear gain compensation). Finally, the usefulness of the model to the development of more effective algorithms for the interpretation of sonar data is discussed.
“This book shows the breadth of computer vision research, from low-level processing to high-level analysis and applications.”