![]() |
This book contains 31 selected papers (out of 136 accepted) from the 9th Scandinavian Conference on Image Analysis, held in Uppsala, Sweden, 6–9 June 1995. They represent the very best of what is currently done in image analysis, world-wide, describing very recent work. The papers have been both considerably expanded and updated compared to the version in the conference proceedings, giving the readers a much better understanding of the issues at hand.
The papers cover both theory and successful applications. There are chapters on Edges and Curves, Texture, Depth and Stereo, Scene Analysis, and 3D Motion, thus covering the chain from feature extraction to computer vision. Two important application areas are covered: Medical and Industrial.
https://doi.org/10.1142/9789812830579_fmatter
The following sections are included:
https://doi.org/10.1142/9789812830579_0001
This paper derives a model for the variance in the length and orientation of edge segments extracted from an image. It is assumed that edge pixel positions are subject to noise. The effect of this positional uncertainty is propagated all the way to segment length and orientation. The model predicts an approximately constant variance in segment length, whereas the variance in segment orientation decreases with increased segment length.
The theoretical model is verified by experiments on real images and it is shown that segment orientation is more stable (has higher signal to noise ratio) than length for segments longer than 1/10 of the image size.
https://doi.org/10.1142/9789812830579_0002
A new branch of Hough Transform algorithms, called probabilistic Hough Transforms, has been actively developed in recent years. One of the first was a new and efficient probabilistic version of the Hough Transform for curve detection, the Randomized Hough Transform (RHT). The RHT selects n pixels from an edge image by random sampling to solve n parameters of a curve and then accumulates only one cell in a parameter space. In this paper, a novel extension of the RHT, called the Connective Randomized Hough Transform (CRHT), is suggested to improve the RHT for complex and noisy pictures. Tests with synthetic and real-world images demonstrate the high speed and low memory usage of the CRHT, as compared both to the Standard Hough Transform and the basic RHT.
https://doi.org/10.1142/9789812830579_0003
In many applications more than one curve type is needed to explain the edge data reasonably well. In this paper we present a robust algorithm which is designed to work concurrently with different curve types where each curve type selects its own domain of applicability. The major components of our algorithm are: a) A data-driven exploration of the edge data which produces estimates of possible curves. b) A selection which regards the curves produced by the exploration as hypotheses and decides which of them are needed to explain the edge data. c) A robust extension of a least squares algorithm which is initialized with the selected curves. We demonstrate with examples that our algorithm is robust and produces highly accurate results on both dense and sparse edge data.
https://doi.org/10.1142/9789812830579_0004
This article presents a method for segmenting and classifying edges using minimum description length (MDL) approximation with automatically generated break points. A scheme is proposed where junction candidates are first detected in a multi-scale pre-processing step, which generates junction candidates with associated regions of interest. These junction features are matched to edges based on a spatial coincidence. For each matched pair, a tentative break point is introduced at the edge point closest to the junction. Finally, these feature combinations serve as input for an MDL approximation method which tests the validity of the break point hypothesis and classifies the resulting edge segments as either “straight” or “curved” Experiments on real world image data demonstrate the viability of the approach.
https://doi.org/10.1142/9789812830579_0005
The use of range data has become quite popular in computer vision. One of the common methods for range data acquisition is based on coded light. However, the coded light approach suffers from the existence of shadow areas, where no range data can be measured. The borders of these shadow areas may correspond to real, geometric edges or to non-geometric edges. In this paper, we present a new approach to the interpretation of edges in range images of polyhedra. First, we integrate the edges extracted from the grey level and range image of a scene. Then, the edges are classified into one of the types jump, convex, concave, or non–geometric. While jump, convex and concave edges correspond to real edges, the non–geometric edges, which are caused by shadow, can be removed. Such a classification of physical edges together with the elimination of shadow edges potentially improves any subsequent object recognition step.
https://doi.org/10.1142/9789812830579_0006
Several texture features are introduced from a proposed higher-order statistical matrix, the gray level gap length matrix (GLGLM). The GLGLM measures the gray level variations in an image. This matrix can be seen as a complement to the gray level run length matrix (GLRLM). It offers size distribution of texture elements for a given direction in the image. We find that the feature set of the GLGLM gives good results for texture classification. For periodicity detection, features extracted from the GLGLM works much faster than the commonly used K parameters, and provides additional quasi periodicities. The method provides effective and efficient texture classification, periodicity detection and image granulometry. The approach opens a new and computationally efficient way of automatic texture analysis and synthesis.
https://doi.org/10.1142/9789812830579_0007
A new method for estimating fractal dimension is introduced, based on polygonization of 1D curves obtained by scanning 2D images. A strategy for automatic determination of the range of scales where the image is approximately fractal is described. Two new lacunarity measures are also introduced. The method has been compared to several different methods for estimation of fractal dimension and lacunarity. The test set contains a sequence of synthetic images with known fractal parameters, as well as digitized TEM images of mouse liver cell nuclei. The method seems well suited for segmentation purposes.
https://doi.org/10.1142/9789812830579_0008
The main contributions of this paper are the discussions of sparse texture fields and estimation of texture gradients in such fields. Sparse texture fields are constructed using region of significance in combination with local frequency. This region of significance is a confidence measure for the analysis in a given spatial position based on local phase estimate. The segmentation is performed using Normalized Differential Convolution. Biological motivation for feature extraction is given based on preattentive perception. Examples of texture gradient estimation in sparse texture fields are given as well as for dense texture fields.
https://doi.org/10.1142/9789812830579_0009
In this chapter we present a method for texture boundary tracking based on Gabor phase. We show that phase alone can be used as feature for texture boundary discrimination. Classic methods for texture discrimination only use Gabor magnitude and discard phase information in spite of the fact that phase represents important information use for to texture discrimination. Since phase unwrapping in the two-dimensional case is very costly and is not robust enough, we use the phase derivative or phase gradient for texture discrimination. A boundary tracking algorithm that uses a “butterfly” shaped test region is used in order to find the best discrimination between two regions. We show that Gabor phase alone contains enough information for texture boundary tracking and that it can be used to improve and complement texture segmentation methods.
https://doi.org/10.1142/9789812830579_0010
This paper deals with the problem of reconstructing the locations of a number of points in space from three different images taken by uncalibrated cameras. It is assumed that the correspondences between the points in the different images are known.
In the case of six points this paper shows that there are in general three solutions to the problem of determining the shape of the object, but some of them may be complex and some may not be physically realisable (e.g. points behind the camera). The solutions are given by a third degree polynomial with coefficients depending on the coordinates of the points in the image. It is also shown how a priori information of the object, such as planarity of subsets of the points, can be used to make reconstruction. In this case the reconstruction is unique and it is obtained by a linear method. Furthermore it is shown how additional points in the first two images can be used to predict the location of the corresponding point in the third image, without calculating the epipoles. Finally a linear method for the reconstruction in the case of at least seven point matches are given.
https://doi.org/10.1142/9789812830579_0011
From two calibrated perspective views of a scene a direct metric reconstruction is made. It is in a precise sense the least committed reconstruction. From assumptions of rotational invariance of 3D space and camera projection models the priors needed for a Bayesian estimation are deduced. A model of approximate Lambertian reflection is used. The resulting Bayesian estimation scheme is capable of reconstructing depth discontinuities. The constraint induced by the assumption of isotropy is furthermore shown to be invariant under change of the extrinsic camera parameters and scale. A multi-scale GNC-like algorithm is used to find a solution from initial estimates.
https://doi.org/10.1142/9789812830579_0012
To become versatile and generally useful, a robot vision system must be able to deal with specular objects, deliver uncertainty measures of recovered data and supply information that can easily be used for CAD–to–vision matching. All this has proved to be difficult.
This paper describes stereopsis using two controllable point sources in combination with two–cameras. The point sources are switched on sequentially which results in four images, two from each camera view. By studying intensity changes between the two illumination conditions, the stereo matching is simplified, it is possible to recover cylindrical shape and calculate covariance of the retrieved range and shape data.
When an object is turned, the cutting edge gives an inherent surface structure that gives very prominent reflection characteristics. These reflections are very likely to be observed and can, in the applicable cases, be used to retrieve object shape and pose. The paper is concluded by a number of experiments that show recovered range and shape data with estimated uncertainty. Recovered range data has been exported to a CAD system to illustrate the CAD–vision connection and to be able to compare range data with geometrical object models.
https://doi.org/10.1142/9789812830579_0013
In this paper a monocular system that builds a 3-D environment model is presented. Features, such as edges and their combinations, are extracted from image frames and tracked over sequences of images. Kalman filters are used to estimate both the motion of the camera and the structure of the environment. New features are integrated to the environment model during the operation.
The focus of attention has been on modeling the uncertainties correctly. In this way the converge rate can be speeded up safely. Another main point has been a high frame rate without any special hardware. This needs very fast and efficient dynamic image processing. The measurements are performed as fast as possible with high enough accuracy and confidence. Better measurements are given more weight in the model updating than more uncertain ones.
The operation of the system has been demonstrated in real time with a manipulator that picks randomly situated blocks from a table. More complicated scenes have also been used. In another approach, 3-D industrial pallet scenes have been modelled. The appearance of the pallets can also be utilized.
https://doi.org/10.1142/9789812830579_0014
In this article, we propose a theoretical study of the Generalized Cylinders (GC) with constant circular cross-section viewed under perspective projection. It is shown that the geometrical properties of this object class and image formation are sufficient to compute the corresponding model. The modelling algorithm used the following property: given a matching between two image points coresponding to the limb points of a same cross-section, it is possible to compute the space orientation and the location of this cross-section to within a ray ratio. If the radius of the cross-section is a-priori known, then the correct scale is obtained.
This method includes several steps: the contour fitting using BSpline curves, the modelling algorithm it self and finally the smoothing of reconstructed axis using BSpline approximation too.
Experiments on real image are finally presented, after a comparison test with the orthographic projection model.
https://doi.org/10.1142/9789812830579_0015
This article deals with the problem how to recognize solid 3D industrial objects and how to determine their orientations and positions (poses). A flexible and transparent method is proposed for the recognition of these objects in occluded and non-occluded scenes using a stereo camera setup.
The method produces hypothesis of objects in single images and the combination of stereo images using object silhouettes and a derived structure containing 3D information. A multi-layer recognition approach is used; both topological and geometrical information about the silhouettes have been integrated for optimal classification. Special attention has been paid to speed for fast robot vision applications. As a result, the number of models has been kept small and computationally expensive algorithms have been avoided.
https://doi.org/10.1142/9789812830579_0016
This paper proposes a novel method to detect three-dimensional objects in arbitrary poses and sizes from a complex image and simultaneously measure their poses and sizes. We refer to this process as image spotting. In the learning stage, for a sample object to be learned, a set of images is obtained by varying pose and size. This large image set is compactly represented by a manifold in compressed subspace spanned by eigenvectors of the image set. This representation is called the parametric eigenspace representation. In the image spotting stage, a partial region in an input image is projected to the eigenspace, and the location of the projection relative to the manifold determines whether this region belongs to the object, and what its pose is in the scene. This process is sequentially applied to the entire image at different resolutions. Experimental results show that this method accurately detect the target objects.
https://doi.org/10.1142/9789812830579_0017
In this paper, perceptual grouping is used to assign interest levels to complex structures in images of aerial scenes. The interest operator increases exponentially as more features belonging to the structure are detected. We apply the theory of non-accidentalness to focus our attention on events which are least likely to occur anywhere in the image except for the structure we are detecting. Region information obtained in the first sweep is fed back to the earlier levels of processing to improve the performance of feature extraction. We use the approach to detect bridges and runways although the ideas developed could be applied to any other complex structures. Only a small number of hypotheses are generated and results presented show that, in most cases, the interest levels for regions containing the structures are much higher than those for other regions. It is also shown that using the focus of attention improves the performance of feature extraction.
https://doi.org/10.1142/9789812830579_0018
Time to contact can be approximated from the divergence of the flow field. Direct differentiation of a measured flow field is not feasible in practice because of the ill-posedness of differentiation. We propose a group theoretical framework for direct determination of higher order flow fields. The basis is the invariance of the power spectrum of the Fourier transform to translations. By change of coordinate-system, invariance to other groups of actions can be constructed. Using this phase-based estimation the group element acting on the image can be recovered simply by taking inner products in the normal retinal coordinate system. The theory is applied to estimation of first order optic flow.
https://doi.org/10.1142/9789812830579_0019
The chapter describes an improved method of obtaining the masks of moving objects by means of Markov Random Field (MRF) modelling. A typical model includes only two observations: the difference in brightness between corresponding pixels in two images, and the value assigned to the pixel by the mask of temporal changes between consecutive images. In the new model, a third observation is added, that is the brightness at a pixel. Three observations are particularly useful for detecting the motion of objects with locally constant brightness. As a result, one can significantly improve the quality of the masks of moving objects, notably in the case of rotation, where there is a significant overlapping of the object on itself from one image to the other. The chapter presents equations necessary for the calculation of the energy of the MRF. The procedure called deterministic relaxation (DR) is described, which allows one to find the realization of the MRF for which the minimum of the energy of the field is achieved. Examples of obtained masks of moving objects are given.
https://doi.org/10.1142/9789812830579_0020
This paper is concerned with the detection of moving objects in an image sequence acquired with a mobile camera. More precisely, we have developed a new scheme able to cope with small moving objects. This approach first relies on the use of a motion compensation step over an extended time interval.
Sliding subsets of n images are considered (typically, we take n = 8) and for each subset, images are warped to the first image using a 2D quadratic motion model. Then, our problem consists in detecting moving regions in the motion compensated image sequence; this is stated as a binary labeling problem embedded in a statistical regularization framework based on Markov Random Fields (MRF). Observations are derived from different temporal filtering operators according to the Least Mean Square (LMS) error obtained in the motion compensation step. An adequate heuristic label map is built. It serves to initialize the relaxation process corresponding to the minimization of the energy function we have designed.
Several experiments have been carried out with real image sequences involving various types of camera movements. Obtained results demonstrate that this method can successfully handle scenes with small, poorly contrasted and slow moving objects.
https://doi.org/10.1142/9789812830579_0021
We apply the 3D-orientation tensor representation to construct an object tracking algorithm. 2D-line normal velocity is estimated by computing the eigenvector associated with the largest eigenvalue of 3D (two spatial dimensions plus time) tensors with a planar structure. Object's true 2D velocity is computed by averaging tensors with consistent normal velocities, generating a 3D line represention that corresponds to a 2D point in motion. Motion induced by camera rotation is compensated for by ignoring points with velocity consistent with the ego-rotation. A region-of-interest growing process based on motion consistency generates estimates of object size and position.
https://doi.org/10.1142/9789812830579_0022
Principal component analysis (PCA) of image data sets is a well known technique, and it is used in many disciplines. With present possibilities for interactive explorative work, it is possible to combine preprocessing in scene space or in data space with PCA. Using PCA in this way demands ways of comparing the results. Different ways of using graphics for this purpose is discussed. Another way to compare the results is to perform measurements on the obtained PC image data sets. How this could be done is discussed, and a measure of the local signal-to-noise ratio (SNR) in the PC images is proposed. It is particularly useful since it indicates how well the noise is handled by the PCA, and this is demonstrated using two application examples from medical imaging.
https://doi.org/10.1142/9789812830579_0023
Numerous techniques have been developed for biomedical image segmentation, but most of these methods do not perform well on a wide variety of data. We present an approach to the segmentation of biomedical image data, based on interactive deformable contours, that has broad applicability. We demonstrate the use of deformable contours on the segmentation of a nerve cell and a human embryo heart from serial tissue sections, and the segmentation of brain structures in MR data. In addition, we demonstrate that interactive deformable contours hold great promise for model-based segmentation, when combined with a computerized brain atlas.
https://doi.org/10.1142/9789812830579_0024
We present an image analysis system developed to measure the motion of white blood cells from a temporal sequence of fluorescence microscopy images. A two pass spatio-temporal segmentation system is used. Pixels are classified as cell and background pixels by an initial segmentation in the first pass. Region labeling, correction and cell tracking are done in the second pass. After segmentation, shape features are estimated from binary discrete regions, and cell motion is then measured by using shape features. A supervised method based on shape features is used to evaluate the results of the segmentation.
https://doi.org/10.1142/9789812830579_0025
A new method for the diagnosis and treatment of exsudative age-related macular degeneration based on the imaging capabilities of a Scanning Laser Ophthalmoscope (SLO) is introduced. A map of essential anatomical features and pathological changes is drawn by information fusion of features obtained from different SLO images. A new technique for image registration, the affine matching of intermediate symbolic information, is presented, so that features extracted from different sources can be fused into one representation. Based on the obtained spatial relationship between anatomical and pathological features, the classification of the patients into three groups, each receiving different treatment, can be performed. This classification concept was successfully applied to a group of more than 50 patients.
https://doi.org/10.1142/9789812830579_0026
In this paper, we suggest a one-dimensional (scalar) approach to the method of optical flows for estimating the contracting/expanding motion in echocardiographic image sequences of the left ventricle. Provided that the method's output is a “valid” estimate of the radial velocity component, it offers, at an equivalent spatial resolution, a significant computational saving compared to the traditional method. An evaluation of the validity is carried out by comparison of the results of the two methods on the basis of a normalized mean square deviation. The one-dimensional method then seems to reproduce radial velocities similar to the ones obtained by using the traditional method.
In order to analyze the scalar velocity estimates, we can summarize them as time series representing the motion of different parts of the cardiac wall. By comparing features in the radial motion from different sectors of the left ventricle and from different heart cycles, the presented system offers a useful tool for identification and characterization of ischemia in the cardiac function.
https://doi.org/10.1142/9789812830579_0027
In order to test the ability to automate early skin cancer recognition, a system for the computerized analysis of images obtained from epiluminescence microscopy (ELM) has been developed. As an initial step, the boundaries of the skin lesion are determined by global thresholding and morphological operations. Then, a set of features containing information about the malignity of the lesion is extracted. This set includes both shape and radiometric features. Finally, a Karhunen-Loeve-transform and a minimum distance classification are applied in order to classify the lesion as either benign or malignant. It was found that 86% of the test images were classified correctly.
https://doi.org/10.1142/9789812830579_0028
In this paper we present a system for high speed pixelwise spectral classification. The system is based on the line imaging PCP spectrograph combined with the MAPP2200 Smart Image Sensor. The classification is implemented using a near sensor approach were linear discriminant functions are calculated using variable exposure times and analog summation of pixel data. After A/D-conversion the sums are compared and classified pixels are output from the sensor chip. The speed of the system is in practice only limited by the available amount of light, with a theoretical maximum classified pixel rate of 1-3 MHz depending on number of classes etc.
https://doi.org/10.1142/9789812830579_0029
Automated visual grading of lumber is attractive for the sawmill industry. The accuracy and the homogeneity of the quality grading in voluminous production easily improves the profit obtained from it.
In this paper we describe a color vision based approach to automated grading of dried softwood lumber. The proposed inspection principle is to recognize the sound wood regions early as this reduces the computational requirements at later stages. This is important because color increases the data volumes significantly over grey scale images. The computational solutions have turned out to be simpler than with gray scale data, compensating for the higher cost of the imaging system.
https://doi.org/10.1142/9789812830579_0030
A machine vision system for automatic detection of blemishes on fruits is presented. To increase image contrast and deal with natural variations in fruit surface colour, appropriate spectral regions are used. As blemishes usually appear as discoloured patches in fruit images, they are treated as catchment basins in grey-level landscapes. A flooding algorithm has been developed to detect the basins, i.e. the patches. Stalk and calyx areas also appear as dark patches in images when non-orientated fruits are presented, and a separation of them from blemishes is necessary. Since many defects occur on the flat or convex surfaces of a fruit while the surface around the stalk or calyx is often concave, the 3D information of surface shape is helpful in the separation. Structured light is used to obtain the qualitative information about fruit geometric shape. Incorporating the information with features extracted from grey level images, a neural network is trained to classify each detected patch as blemish or stalk/calyx. The experimental results with apples, peaches and apricots demonstrate that the system can detect blemishes on them.
https://doi.org/10.1142/9789812830579_0031
The topic is that of estimating average size of packed particles, from formulas involving edge density, in particular moment-based. An application relevant for the quarry industry is discussed. For long sequences, an average shape factor, needed for the average size estimation, is calculated sparsely in time. Edge density is used to estimate and monitor average particle size.