![]() |
This book contains a selection of papers which were presented at the Vision Interface '92 Conference. It also includes several invited articles from prominent researchers in the field, suggesting future directions in Computer Vision.
https://doi.org/10.1142/9789814343312_fmatter
The following sections are included:
https://doi.org/10.1142/9789814343312_0001
Good Old Fashioned Artificial Intelligence and Robotics (GOFAIR) relies on a set of restrictive Omniscient Fortune Teller Assumptions about the agent, the world and their relationship. The emerging Situated Agent paradigm is challenging GOFAIR by grounding the agent in space and time, relaxing some of those assumptions, proposing new architectures and integrating perception, reasoning and action in behavioral modules. GOFAIR is typically forced to adopt a hybrid architecture for integrating signal-based and symbol-based approaches because of the inherent mismatch between the corresponding on-line and off-line computational models. It is argued that Situated Agents should be designed using a unitary on-line computational model. The Constraint Net model of Zhang and Mackworth satisfies that requirement. Two systems for situated perception built in our laboratory are described to illustrate the new approach: one for visual monitoring of a robot's arm, the other for real-time visual control of multiple robots competing and cooperating in a dynamic world.
https://doi.org/10.1142/9789814343312_0002
The goal of computer vision is to derive descriptive information about a scene by computer analysis of images of the scene. Vision algorithms can serve as computational models for biological visual processes, and they also have many practical uses; but this paper treats computer vision as a subject in its own right. Vision problems are often ill-defined, ill-posed, or computationally intractable; nevertheless, successes have been achieved in many specific areas. We argue that by limiting the domain of application, carefully choosing the task, using redundant data (multi-sensor, multi-frame), and applying adequate computing power, useful solutions to many vision problems can be obtained. Methods of designing such solutions are the subject of the emerging discipline of Vision Engineering. With projected advances in sensor and computing technologies, the domains of applicability and ranges of problems that can be solved will gradually expand.
https://doi.org/10.1142/9789814343312_0003
We outline the process by which curves can be inferred from images, based on variational and differential-geometric principles, and then show how certain of these curves (bounding contours) can lead to generic shape descriptions via Hamilton-Jacobi theory.
https://doi.org/10.1142/9789814343312_0004
For realtime classification applications (e.g. realtime image segmentation), the number of usable pattern classification algorithms is limited by the feasibility of high-speed hardware implementation. This paper describes a non-parametric pattern classifier and associated hardware architecture andtraining algorithms. The classifier has both a feasible hardware implementation and other desirable properties not normally found in statistical classifiers. In addition to the classification/training algorithms and hardware architecture, the paper discusses the application of the technique to the problem of image segmentation. Results from segmenting images are included.
The scheme described has two major aspects: (1) The classifier itself, which is implemented as a 2n-tree, a hierarchical data structure that corresponds to a recursive decomposition of feature space and (2) Training schemes, specific to the 2n structure, by which the classification tree is constructed. The training algorithms discussed have the following important properties:
They are non-parametric and therefore independent of any particular probability model (e.g. Gaussian).
They can handle any shaped decision regions in feature space.
They are consistent in the sense that for large training data sets they produce a classifier that approaches the ideal Bayes classifier.
The training algorithms also include an interesting application of the Minimum Description Length principle (MDL). It is used in a tree pruning algorithm that produces trees that are both significantly smaller and, at the same time, have better classification performance (i.e. lower error rates) than unpruned trees.
The architecture affords a practical hardware implementation that can also be used to implement other classification schemes.
https://doi.org/10.1142/9789814343312_0005
The Parallel Vision Group at Simon Fraser University has built a hybrid pyramidal vision machine. The machine consists of 512 SIMD processors at the bottom and a pyramid of 63 MIMD transputers on top. Custom hardware links called PARLinks, have been developed to enable massively parallel communication between the SIMD processors and the transputers. Two applications have been implemented on the vision machine. The first is a parallel and hierarchical (pyramidal) Hough line detection. This application has been integrated into a robotic workcell. The second application is a Hierarchical Linear Generalized Hough Transform for recognition of objects of arbitrary shape. It is shown that the pyramidal vision machine and algorithms are suitable for real-time object recognition.
https://doi.org/10.1142/9789814343312_0006
A new approach to mixed-script recognition is explored. Words are inspected through a fixed size window and sequences of letters are identified by slowly moving the window on the word. From the sequences of letters found, the word length can be estimated, and letter candidates for each position within the word can be proposed. Six writers participated in an experiment. Each wrote a training data set of 250 words and a test data set of 275 words. The recognition of each sequence of letters seen through a window is performed by a correlation with patterns kept in memory, coming from the training data set. Results are presented on estimation of word length, and on letter recognition rates.
https://doi.org/10.1142/9789814343312_0007
Subpixel edge detection is concerned with determining precise location of an edge within a pixel. In this article, parametric models are developed for subpixel step and roof edge detection. Step edges result from sudden change in image intensity profile; roof edges are formed due to discontinuities in image gradient. In range images, they correspond to change in depth value and surface normals, respectively. A set of orthogonal Zernike moments are calculated in the local neighborhood to estimate the parameters associated with edge models. Extensive theoretical analysis is performed to show the robustness of the proposed technique. Experimental results with real as wellas synthetic images are presented to demonstrate the efficacy of the proposed subpixel edge detection technique.
https://doi.org/10.1142/9789814343312_0008
This chapter presents a gradient-based approach to the multi-constraint estimation of dense two-dimensional (2-D) motion. The formulation is based on two assumptions: that at least one feature exists that is constant along motion trajectories and that motion vectors are smooth functions of spatial coordinates. From these assumptions matching and smoothness errors are derived and combined to obtain a cost function. The cost function is minimized using a sequence of quadratic approximations of the matching error and solving the resulting linear system by deterministic relaxation. The structural model used (relating the motion vectors and data) permits the use ofmultipleimage features as the input, for example intensity and colours, or sub-bandsofa spectral decomposition. The motion model reduces ill-posedness of the problem through a smoothness constraint. The proposed algorithm is a generalization of the Horn and Schunck algorithm [5] to the case of vector data. Results of applicationofthe proposedtechnique to the estimation of 2-D motion from TV images are shown. Themotion fieldsobtained are applied to motion-compensated temporal interpolation resultingin significant (but localized) improvements.
https://doi.org/10.1142/9789814343312_0009
Estimating 3-D motion parameters of rigid objects from two-frame monocular images is considered in this paper. A two-step robust algorithm is presented. Our main idea is to regularize the initial problem formulation in oder to obtain a stable solution. In the first step, a linear algorithm is applied to generate an initial guess for the solution. The perturbation bounds are given to ensure the stability of the algorithm. In the second step, an iterative method based on Tikhonov regularization approach is used to improve the guess which is within a closed convex set. Several theorems about the convergence, convergence rate and stability of the solution are given and proved. The results of experiments are satisfactory with noisy synthetic data. The numerical results have verified the theoretical results.
https://doi.org/10.1142/9789814343312_0010
To be effective in classification of texture images and textured-based segmentation, a comprehensive representation of texture image is an important issue. In this paper, we present a multidimensional, representational scheme which encapsulates structural and statistical information of a homogeneous texture image. In particular, we shall concentrate on the visual and perceptional aspects. For monochrome (grey level) textures, four visual properties - coarseness, contrast, directionality and line-likeness - are defined based on the scheme. For colour textures, our attention will focus on the colour aspect. An algorithm is presented to select a set of characteristic colours from an input image. Experiments have demonstrated the effectiveness of the proposed representational schemes.
https://doi.org/10.1142/9789814343312_0011
A new computational approach to resampling large datasets is presented. Its basic features are: (i) formulation and implementation in three dimensions; (ii) capability to work in both a binary and a gray-level environment; and (iii) incorporation of smoothing to handle the discontinuities present at an object's boundary. The new interpolator is first introduced in one dimension and then extended to two and three dimensions. The amount of computation involved in each case is analyzed. Applications to the magnification of binary objects (from a 27 × 27 × 27 raster grid to a 29 × 29 × 29 raster grid) and of gray-level objects (from a 28 × 28 × 36 raster grid to a 29 × 29 × 29 raster grid) are also presented.
https://doi.org/10.1142/9789814343312_0012
In many robotic tasks such as collision avoidance and grasping, 3-dimensional models of polyhedral objects are needed. These models are in the form of 3-dimensional faces, edges, vertices and their geometry and topology (similar to B-Rep models in solid modelling literature). In this research, we provide a new hybrid approach to extract such a 3-dimensional polyhedral model of the visible surfaces of a polyhedral object in a range image. We present a unique approach called “combine and compare” to obtain robust analytical descriptions of faces, edges and vertices. First, straightforward edge-based methods are used to extract jump and roof edge maps. Then, region-based methods in conjunction with least-squares-fit techniques are used to extract analytical descriptions of the planes that form the faces of the object. All the plane descriptions are then intersected (“combined”) to obtain possible roof edge segments. These segments are then validated (“compared”) with the roof edge map to derive the actual edge segments. A similar “combine and compare” process is used to derive the vertices from the edge segments. These faces, edges and vertices are combined to yield a B-Rep type model of the polyhedral object. Using region-based information also facilitates a simpler processing and classification of the vertices than existing pure edge-junction based vertex classification. We view this work as a step toward the larger goal of obtaining complete B-Rep models of objects in a scene from multiple range images.
https://doi.org/10.1142/9789814343312_0013
This paper examines the construction of a detailed 3D surface model of an object rotating in front of a stationary video camera. An algorithm is developed which integrates repeated stereo views of a rotating object into a 3-D model of the object. Starting with disparity estimates obtained using an existing stereo algorithm, the algorithm presented here obtains the true depth of the recovered points. As the object is rotated in front of the camera, these points are then used to construct an octree representation of the object. The resulting representation provides a full 3D representation of the objects visible exterior surfaces suitable for inclusion in CAD/CAM modeling packages.
https://doi.org/10.1142/9789814343312_0014
This paper presents a three dimensional border identification method based on the sign of the second derivative of intensity change. For a bright object surrounded by a darker background, the condition for a voxel being on a border is that the second derivative is negative and changes sign for neighbors in the gradient direction. To compute the second derivative, a 3D edge operator is applied to compute the gradient. The gradient directions are quantized to 26 vectors. Asymmetric Gaussian filters for 3D convolutions are designed for the 26 gradient vectors. Hence the computation of 3D convolution can be speeded up. Although the quantization of gradient directions to 26 vectors is not smooth, analysis shows that the condition for identifying border voxels is not sensitive to quantization errors. Moreover, there exists exactly one layer of voxels that satisfies the condition. This avoids a multi-layer problem, hence tracking of border voxels can simply be a breadth first search. An experimental result of 3D surface identification, tracking, and display of real medical data is given.