Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Scene analysis is so far one of the most important topics in machine vision. In this paper, we present an integrated scene analysis model, namely SCENOGRAM (Scene analysis using CompositENeural Oscillatory-based elastic GRAph Model). Basically the proposed scene analyzer is based on the integration of the composite neural oscillatory model with our elastic graph dynamic link model. The system involves: (1) multifrequency bands feature extraction scheme using Gabor filters, (2) automatic figure-ground object segmentation using a composite neural oscillatory model, and (3) object matching using an elastic graph dynamic link model.
From the implementation point of view, we introduce an intelligent agent based scene analysis and object identification solution using the SCENOGRAM technology. From the experimental point of view, a scene gallery of over 6000 color scene images is used for automatic scene segmentation testing and object identification test. An overall correct invariant facial recognition rate of over 87% is attained. It is anticipated that the implementation of the SCENOGRAM can provide an invariant and higher-order intelligent object (pattern) encoding, searching and identification solution for future intelligent e-Business.
A new method recognizing and locating partially occluded two-dimensional parts is presented. The objects are described by a set of segments derived form the polygonal coding of their contours, and by the geometrical relationships between the segments. Rewriting rules are used to improve the stability of the polygonal coding. The identification process utilizes a robust hypothesis generator, from which the segment research assisted by the spatial relationship is propagated into the scene. The originality of this method relies mainly on the use of structural relationships between the segments to select the robust initialization hypotheses, and the use of structural research to achieve hypothesis propagation. These last two points with the use of a hash-coding technique to improve the location of predicted segments, greatly reduce the combinatories and make the algorithm particularly rapid and effective. This approach is integrated within a vision system of a Flexible Assembly Workcell to accomplish the automatic assembly of partially overlapping parts.
We present a system for the interpretation of camera images of scenes composed of several known objects with mutual occlusion. The scenes are analyzed by the recognition of the objects present and by the determination of their occlusion relations. Objects are internally represented by stored model graphs. These are formed in a semi-automatic way by showing objects against a varying background. Objects are recognized by dynamic link matching. Our experiments show that our system is very successful in analyzing cluttered scenes. The system architecture goes beyond classical neural networks by making extensive use of flexible links between units, as proposed in the dynamic link architecture. The present implementation is, however, rather algorithmic in style and is to be regarded as a pilot study that is preparing the way for a detailed implementation of the architecture.
Automatic image annotation is an important and challenging job for image analysis and understanding such as content-based image retrieval (CBIR). The relationship between the keywords and visual features is too complicated due to the semantic gap. We present an approach of automatic image annotation based on scene analysis. With the constrain of scene semantics, the correlation between keywords and visual features becomes simpler and clearer. Our model has two stages of process. The first stage is training process which groups training image data set into semantic scenes using the extracted semantic feature and visual scenes constructed from the calculation distances of visual features for every pairs of training images by using Earth mover's distance (EMD). Then, combine a pair of semantic and visual scene together and apply Gaussian mixture model (GMM) for all scenes. The second stage is to test and annotate keywords for test image data set. Using the visual features provided by Duygulu, experimental results show that our model outperforms probabilistic latent semantic analysis (PLSA) & GMM (PLSA&GMM) model on Corel5K database.
Laser or millimeter wave radars have shown good performance in measuring relative speed and distance in a highway driving environment. However the accuracy of these systems decreases in an urban traffic environment as more confusion occurs due to factors such as parked vehicles, guardrails, poles and motorcycles. A trinocular stereo-based sensing system provides an effective supplement to radar-based road scene analysis with its much wider field of view and more accurate lateral information. This paper presents an efficient solution using a trinocular stereo based 3D representation method of driving environment which employs the "U-V-disparity" concept. It is used to classify a 3D road scene into relative surface planes and characterize the features of road pavement surfaces, roadside structures and obstacles. Real-time implementation of the trinocular stereo disparity calculation and the "U-V-disparity" classification algorithm is also presented in this paper.
We present a system for the interpretation of camera images of scenes composed of several known objects with mutual occlusion. The scenes are analyzed by the recognition of the objects present and by the determination of their occlusion relations. Objects are internally represented by stored model graphs. These are formed in a semi-automatic way by showing objects against a varying background. Objects are recognized by dynamic link matching. Our experiments show that our system is very successful in analyzing cluttered scenes. The system architecture goes beyond classical neural networks by making extensive use of flexible links between units, as proposed in the dynamic link architecture. The present implementation is, however, rather algorithmic in style and is to be regarded as a pilot study that is preparing the way for a detailed implementation of the architecture.
A framework is presented, which allows for the automated generation of texture filters by exploiting the 2D-Lookup algorithm and its optimization by evolutionary algorithms. To use the framework, the user has to give an original image, containing the structural property-of-interest (e.g. a surface fault), and a binary image (goal image), wherein each position of the structural property-of-interest is labeled with the foreground color. Doing so, the framework becomes capable of evolving the configuration of the 2D-Lookup algorithm towards a texture filter for the structural property-of-interest. Genetic programming (GP) is used as the evolutionary algorithm. For this GP approach, a filter generator derives two operations based on formal superoperators from the tree, which represents an individual of the evolving population. The specification of the 2D-Lookup matrix is performed by a relaxation technique. The approach will be demonstrated on texture fault examples.
In the field of vision based robot actuation, in order to manipulate objects in an environment, background separation and object selection are fundamental tasks that should be carried out in a fast and efficient way. In this paper, we propose a method to segment possible object locations in the scene and rec-ognize them via local-point based representation. Exploiting the resulting 3D structure of the scene via a time-of-flight camera, background regions are elimi-nated with the assumption that the objects are placed on planar surfaces. Next, object, recognition is performed using scale invariant features in the captured high resolution images via standard camera. The preliminary experimental re-sults show that the proposed system gives promising results for background segmentation and object recognition, especially for the service robot environ-ments, which could also be utilized as a pre-processing step in path planning and 3D scene map generation.