Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  Bestsellers

  • articleNo Access

    AUTOMATIC DETECTION AND TRACKING OF HUMAN HEADS USING AN ACTIVE STEREO VISION SYSTEM

    A new head tracking algorithm for automatically detecting and tracking human heads in complex backgrounds is proposed. By using an elliptical model for the human head, our Maximum Likelihood (ML) head detector can reliably locate human heads in images having complex backgrounds and is relatively insensitive to illumination and rotation of the human heads. Our head detector consists of two channels: the horizontal and the vertical channels. Each channel is implemented by multiscale template matching. Using a hierarchical structure in implementing our head detector, the execution time for detecting the human heads in a 512×512 image is about 0.02 second in a Sparc 20 workstation (not including the time for image acquisition). Based on the ellipse-based ML head detector, we have developed a head tracking method that can monitor the entrance of a person, detect and track the person's head, and then control the stereo cameras to focus their gaze on this person's head. In this method, the ML head detector and the mutually-supported constraint are used to extract the corresponding ellipses in a stereo image pair. To implement a practical and reliable face detection and tracking system, further verification using facial features, such as eyes, mouth and nostrils, may be essential. The 3D position computed from the centers of the two corresponding ellipses is then used for fixation. An active stereo head has been used to perform the experiments and has demonstrated that the proposed approach is feasible and promising for practical uses.

  • articleNo Access

    3D ARTICULATED OBJECT UNDERSTANDING, LEARNING, AND RECOGNITION FROM 2D IMAGES

    This paper is aimed at 3D object understanding from 2D images, including articulated objects in active vision environment, using interactive, and internet virtual reality techniques. Generally speaking, an articulated object can be divided into two portions: main rigid portion and articulated portion. It is more complicated that "rigid" object in that the relative positions, shapes or angles between the main portion and the articulated portion have essentially infinite variations, in addition to the infinite variations of each individual rigid portions due to orientations, rotations and topological transformations. A new method generalized from linear combination is employed to investigate such problems. It uses very few learning samples, and can describe, understand, and recognize 3D articulated objects while the objects status is being changed in an active vision environment.

  • articleNo Access

    KNOWLEDGE-BASED SCENE EXPLORATION USING COMPUTER VISION AND LEARNED ANALYSIS STRATEGIES

    In this contribution we demonstrate how the task of visual scene exploration can be solved by a knowledge-based vision system. During scene exploration, the system searches for a fixed number of a priori known objects in a static scene. If not all objects are visible using the initial camera set-up, the camera parameters have to be adjusted and the camera has to be moved by the system. This problem is reduced to the choice of optimal camera actions. The information about the objects and the camera actions is uniformly represented in a semantic network. In addition, a control algorithm is provided that finds the optimal assignment from objects to parts of a scene based on a suitable analysis strategy. This strategy is acquired by the system itself using reinforcement learning methods. The paper focuses on aspects of knowledge representation concerning the integration of camera actions and on the integration of reinforcement learning methods in a semantic network formalism and applies them in a realistic setup. Experiments are shown for images of two office rooms.

  • articleNo Access

    WHAT THE DRAUGHTSMAN'S HAND TELLS THE DRAUGHTSMAN'S EYE: A SENSORIMOTOR ACCOUNT OF DRAWING

    In this paper we address the challenging problem of sensorimotor integration, with reference to eye-hand coordination of an artificial agent engaged in a natural drawing task. Under the assumption that eye-hand coupling influences observed movements, a motor continuity hypothesis is exploited to account for how gaze shifts are constrained by hand movements. A Bayesian model of such coupling is presented in the form of a novel Dynamic Bayesian Network, namely an Input–Output Coupled Hidden Markov Model. Simulation results are compared to those obtained by eye-tracked human subjects involved in drawing experiments.

  • articleNo Access

    A FRAMEWORK FOR ACTIVELY SELECTING VIEWPOINTS IN OBJECT RECOGNITION

    Object recognition problems in computer vision are often based on single image data processing. In various applications this processing can be extended to a complete sequence of images, usually received passively. In contrast, we propose a method for active object recognition, where a camera is selectively moved around a considered object. Doing so, we aim at reliable classification results with a clearly reduced amount of necessary views by optimizing the camera movement for the access of new viewpoints (viewpoint selection). Therefore, the optimization criterion is the gain of class discriminative information when observing the appropriate next image.

    We show how to apply an unsupervised reinforcement learning algorithm to that problem. Specifically, we focus on the modeling of continuous states, continuous actions and supporting rewards for an optimized recognition. We also present an algorithm for the sequential fusion of gathered image information and we combine all these components into a single framework.

    The experimental evaluations are split into results for synthetic and real objects with one- or two-dimensional camera actions, respectively. This allows the systematic evaluation of the theoretical correctness as well as the practical applicability of the proposed method. Our experiments showed that the proposed combined viewpoint selection and viewpoint fusion approach is able to significantly improve the recognition rates compared to passive object recognition with randomly chosen views.

  • articleNo Access

    HEADS, EYES AND HEAD-EYE SYSTEMS

    Active vision systems can be considered as systems that integrate visual sensing and action. Sensing includes detection of actions/events and results also in specific actions/manipulations.

    This paper mainly addresses the basic issues in the design of a head-eye system for the study of active-purposive vision. The design complexity of such a head is defined by the activeness of the visual system. Although we have not had the motivation to exactly reproduce the biological solutions in a robot, we claim that the designer should carefully consider the solutions offered by evolution.

    The flexibility of the behavioral pattern of the system is constrained by the mechanical structure and the computational architecture used in the control system of the head. The purpose of the paper is to describe the mechanical structure as well as the computational architecture of the KTH-head from this perspective.

  • articleNo Access

    DESIGN AND PERFORMANCE OF TRISH, A BINOCULAR ROBOT HEAD WITH TORSIONAL EYE MOVEMENTS

    We present the design of a controllable stereo vision head. TRISH (The Toronto IRIS Stereo Head) is a binocular camera mount, consisting of two fixed focal length color cameras with automatic gain control forming a verging stereo pair. TRISH is capable of version (rotation of the eyes about the vertical axis so as to maintain a constant disparity), vergence (rotation of each eye about the vertical axis so as to change the disparity), pan (rotation of the entire head about the vertical axis), and tilt (rotation of each eye about the horizontal axis). One novel characteristic of the design is that each camera can rotate about its own optical axis (torsion). Torsional movement makes it possible to minimize the vertical component of the two-dimensional search which is associated with stereo processing in verging stereo systems.

  • articleNo Access

    A LOW-COST ROBOT CAMERA HEAD

    Active vision involving the exploitation of controllable cameras and camera heads is an area which has received increased attention over the last few years. At LIA/AUC a binocular robot camera head has been constructed for use in geometric modelling and interpretation. In this manuscript the basic design of the head is outlined and a first prototype is described in some detail. Detailed specifications for the components used are provided together with a section on lessons learned from construction and initial use of this prototype.

  • articleNo Access

    THE SURREY ATTENTIVE ROBOT VISION SYSTEM

    This paper presents the design and development of a real-time eye-in-hand stereo-vision system to aid robot guidance in a manufacturing environment. The stereo vision head comprises a novel camera arrangement with servo-vergence, focus, and aperture that continuously provides high-quality images to a dedicated image processing system and parallel processing array. The stereo head has four degrees of freedom but it relies on the robot end-effector for all remaining movement. This provides the robot with exploratory sensing abilities allowing it to undertake a wider variety of less constrained tasks. Unlike other stereo vision research heads, the overriding factor in the Surrey head has been a truly integrated engineering approach in an attempt to solve an extremely complex problem. The head is low cost, low weight, employs state-of-the-art motor technology, is highly controllable and occupies a small-sized envelope. Its intended applications include high-accuracy metrology, 3-D path following, object recognition and tracking, parts manipulation and component inspection for the manufacturing industry.

  • articleNo Access

    LAYERED CONTROL OF A BINOCULAR CAMERA HEAD

    This paper describes a layered control system for a binocular stereo head. It begins by a discussion of the principles of layered control. It then describes the mechanical configuration for a binocular camera head with six degrees of freedom. A device level controller is presented which permits an active vision system to command the position of a binocular gaze point in the scene. The final section describes the design of perceptual actions which exploit this device level controller.

  • articleNo Access

    A FOUR DEGREE-OF-FREEDOM ROBOT HEAD FOR ACTIVE VISION

    The design of a robot head for active computer vision tasks is described. The stereo head/eye platform uses a common elevation configuration and has four degree-of-freedom. The joints are driven by DC servo motors coupled with incremental optical encoders and backlash minimizing gearboxes. The details of mechanical design, head controller design, the architecture of the system, and the design criteria for various specifications are presented.

  • articleNo Access

    CONTROL OF EYE AND ARM MOVEMENTS USING ACTIVE, ATTENTIONAL VISION

    Recent related approaches in the areas of vision, motor control and planning are attempting to reduce the computational requirements of each process by restricting the class of problems that can be addressed. Active vision, differential kinematics and reactive planning are all characterized by their minimal use of representations, which simplifies both the required computations and the acquisition of models. This paper describes an approach to visually-guided motor control that is based on active vision and differential kinematics, and is compatible with reactive planning. Active vision depends on an ability to choose a region of the visual environment for task-specific processing. Visual attention provides a mechanism for choosing the region to be processed in a task-specific way. In addition, this attentional mechanism provides the interface between the vision and motor systems by representing visual position information in a 3-D retinocentric coordinate frame. Coordinates in this frame are transformed into eye and arm motor coordinates using kinematic relations expressed differentially. A real-time implementation of these visuomotor mechanisms has been used to develop a number of visually-guided eye and arm movement behaviors.

  • articleNo Access

    BEHAVIOR-BASED ACTIVE VISION

    A vision system was built using a behavior-based model, the subsumption architecture. The so-called active eye moves the camera’s axis through the environment, detecting areas with high concentration of edges, with the help of a kind of saccadic movement. The design and implementation process is detailed in the article, paying particular attention to the fovea-like sensor structure which enables the active eye to efficiently use local information to control its movements. Numerical measures for the eye’s behavior were developed, and applied to evaluate the incremental building process and the effects of the saccadic movements on the whole system. A higher level behavior was also implemented, with the purpose of detecting long straight edges in the image, producing pictures similar to hand drawings. Robustness and efficiency problems are addressed at the end of the paper. The results seem to prove that interesting behaviors can be achieved using simple vision methods and algorithms, if their results are properly interconnected and timed.

  • articleNo Access

    AN INTELLIGENT CAMERA FOR ACTIVE VISION

    Much research is currently going on about the processing of one or two-camera imagery, possibly combined with other sensors and actuators, in view of achieving attentive vision, i.e. processing selectively some parts of a scene possibly with another resolution. Attentive vision in turn is an element of active vision where the outcome of the image processing triggers changes in the image acquisition geometry and/or of the environment. Almost all this research is assuming classical imaging, scanning and conversion geometries, such as raster based scanning and processing of several digitized outputs on separate image processing units.

    A consortium of industrial companies comprising Digital Equipment Europe, Thomson CSF, and a few others, have taken a more radical view of this. To meet active vision requirements in industry, an intelligent camera is being designed and built, comprised of three basic elements:

    – a unique Thomson CSF CCD sensor architecture with random addressing

    – the DEC Alpha 21064 275MHz processor chip, sharing the same internal data bus as the digital sensor output

    – a generic library of basic image manipulation, control and image processing functions, executed right in the sensor-internal bus-processor unit, so that only higher level results or commands get exchanged with the processing environment.

    Extensions to color imaging (with lower spatial resolution), and to stereo imaging, are relatively straightforward. The basic sensor is 1024*1024 pixels with 2*10 bits addresses, and a 2.5 ms (400 frames/second) image data rate compatible with the Alpha bus and 64 bits addressing. For attentive vision, several connex fields of max 40 000 pixels, min 5*3 pixels, can be read and addressed within each 2.5 ms image frame. There is nondestructive readout, and the image processing addressing over 64 bits shall allow for 8 full pixel readouts in one single word.

    The main difficulties have been identified as the access and reading delays, the signal levels, and dimensioning of some buffer arrays in the processor.

    The commercial applications targeted initially will be in industrial inspection, traffic control and document imaging. In all of these fields, selective position dependent processing shall take place, followed by feature dependent processing.

    Very large savings are expected both in terms of solutions costs to the end users, development time, as well as major performance gains for the ultimate processes. The reader will appreciate that at this stage no further implementation details can be given.

  • articleNo Access

    RECIPROCAL-WEDGE TRANSFORM IN ACTIVE STEREO

    The Reciprocal-Wedge Transform (RWT) facilitates space-variant image representation. In this paper a V-plane projection method is presented as a model for imaging using the RWT. It is then shown that space-variant sensing with this new RWT imaging model is suitable for fixation control in active stereo that exhibits vergence and versional eye movements and scanpath behaviors. A computational interpretation of stereo fusion in relation to disparity limit in space-variant imagery leads to the development of a computational model for binocular fixation. The vergence-version movement sequence is implemented as an effective fixation mechanism using the RWT imaging. A fixation system is presented to show the various modules of camera control, vergence and version.

  • articleNo Access

    MODELING SELECTIVE PERCEPTION OF COMPLEX, NATURAL SCENES

    Computational modeling of the human visual system is of current interest to developers of artificial vision systems, primarily because a biologically-inspired model can offer solutions to otherwise intractable image understanding problems. The purpose of this study is to present a biologically-inspired model of selective perception that augments a stimulus-driven approach with a high-level algorithm that takes into account particularly informative regions in the scene. The representation is compact and given in the form of a topographic map of relative perceptual conspicuity values. Other recent attempts at compact scene representation consider only low-level information that codes salient features such as color, edge, and luminance values. The previous attempts do not correlate well with subjects' fixation locations during viewing of complex images or natural scenes. This study uses high-level information in the form of figure/ground segmentation, potential object detection, and task-specific location bias. The results correlate well with the fixation densities of human viewers of natural scenes, and can be used as a preprocessing module for image understanding or intelligent surveillance applications.

  • articleNo Access

    REALIZING FACE-TO-FACE INTERACTION BY VIEW-BASED TRACKING

    In this paper, we propose a method of realizing face-to-face interaction by view-based tracking between a human and a mobile robot. Although individuals can be recognized easily by observing frontal faces, it is difficult to do it from profiles of face. To cope with this problem, our mobile robot first finds face-like areas based upon the skin color distribution which is made from real image samples. Second, a facial pattern in the image is found and its direction relative to the robot is estimated in terms of the memorized views. Finally, the mobile robot moves to the front of the face according to the face direction while tracking the face by using visual feedback based upon active vision approach.

  • articleNo Access

    MAKING OBJECT LEARNING AND RECOGNITION AN ACTIVE PROCESS

    The exploration and learning of new objects is an essential capability of a cognitive robot. In this paper we focus on making use of the robot's manipulation abilities to learn complete object representations suitable for 3D object recognition. Taking control of the object allows the robot to focus on relevant parts of the images, thus bypassing potential pitfalls of purely bottom-up attention and segmentation. The main contribution of the paper consists in integrated visuomotor processes that allow the robot to learn object representations by manipulation without having any prior knowledge about the objects. Our experimental results show that the acquired data is of sufficient quality to train a classifier that can recognize 3D objects independently of the viewpoint.

  • articleNo Access

    ACTIVE SEGMENTATION

    The human visual system observes and understands a scene/image by making a series of fixations. Every "fixation point" lies inside a particular region of arbitrary shape and size in the scene which can either be an object or just a part of it. We define as a basic segmentation problem the task of segmenting that region containing the "fixation point". Segmenting the region containing the fixation is equivalent to finding the enclosing contour — a connected set of boundary edge fragments in the edge map of the scene — around the fixation. This enclosing contour should be a depth boundary.

    We present here a novel algorithm that finds this bounding contour and achieves the segmentation of one object, given the fixation. The proposed segmentation framework combines monocular cues (color/intensity/texture) with stereo and/or motion, in a cue independent manner. The semantic robots of the immediate future will be able to use this algorithm to automatically find objects in any environment. The capability of automatically segmenting objects in their visual field can bring the visual processing to the next level. Our approach is different from current approaches. While existing work attempts to segment the whole scene at once into many areas, we segment only one image region, specifically the one containing the fixation point. Experiments with real imagery collected by our active robot and from the known databases1 demonstrate the promise of the approach.

  • articleNo Access

    ATTENTIONAL OBJECT DETECTION WITH AN ACTIVE MULTI-FOCAL VISION SYSTEM

    A biologically inspired foveated attention system in an object detection scenario is proposed. Bottom-up attention uses wide-angle stereo camera data to select a sequence of fixation points. Successive snapshots of high foveal resolution using a telephoto camera enable highly accurate object recognition based on SIFT algorithm. Top-down information is incrementally estimated and integrated using a Kalman-filter, enabling parameter adaptation to changing environments due to robot locomotion. In the experimental evaluation, all the target objects were detected in different backgrounds. Significant improvements in flexibility and efficiency are achieved.