Loading [MathJax]/jax/output/CommonHTML/jax.js
Skip main navigation

Cookies Notification

We use cookies on this site to enhance your user experience. By continuing to browse the site, you consent to the use of our cookies. Learn More
×

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Existing users will be able to log into the site and access content. However, E-commerce and registration of new users may not be available for up to 12 hours.
For online purchase, please visit us again. Contact us at customercare@wspc.com for any enquiries.

SEARCH GUIDE  Download Search Tip PDF File

  • articleNo Access

    Counting in Visual Question Answering: Methods, Datasets, and Future Work

    Visual Question Answering (VQA) is a language-based method for analyzing images, which is highly helpful in assisting people with visual impairment. The VQA system requires a demonstrated holistic image understanding and conducts basic reasoning tasks concerning the image in contrast to the specific task-oriented models that simply classifies object into categories. Thus, VQA systems contribute to the growth of Artificial Intelligence (AI) technology by answering open-ended, arbitrary questions about a given image. In addition, VQA is also used to assess the system’s ability by conducting Visual Turing Test (VTT). However, because of the inability to generate the essential datasets and being incapable of evaluating the systems due to flawlessness and bias, the VQA system is incapable of assessing the system’s overall efficiency. This is seen as a possible and significant limitation of the VQA system. This, in turn, has a negative impact on the progress of performance observed in VQA algorithms. Currently, the research on the VQA system is dealing with more specific sub-problems, which include counting in VQA systems. The counting sub-problem of VQA is a more sophisticated one, riddling with several challenging questions, especially when it comes to complex counting questions such as those that demand object identifications along with detection of objects attributes and positional reasoning. The pooling operation that is considered to perform an attention mechanism in VQA is found to degrade the counting performance. A number of algorithms have been developed to address this issue. In this paper, we provide a comprehensive survey of counting techniques in the VQA system that is developed especially for answering questions such as “How many?”. However, the performance progress achieved by this system is still not satisfactory due to bias that occurs in the datasets from the way we phrase the questions and because of weak evaluation metrics. In the future, fully-fledged architecture, wide-size datasets with complex counting questions and a detailed breakdown in categories, and strong evaluation metrics for evaluating the ability of the system to answer complex counting questions, such as positional and comparative reasoning will be executed.

  • articleNo Access

    A Multi-Scale Cascaded Hierarchical Model for Image Labeling

    Image labeling is an important and challenging task in the area of graphics and visual computing, where datasets with high quality labeling are critically needed. In this paper, based on the commonly accepted observation that the same semantic object in images with different resolutions may have different representations, we propose a novel multi-scale cascaded hierarchical model (MCHM) to enhance general image labeling methods. Our proposed approach first creates multi-resolution images from the original one to form an image pyramid and labels each image at different scale individually. Next, it constructs a cascaded hierarchical model and a feedback circle between image pyramid and labeling methods. The original image labeling result is used to adjust labeling parameters of those scaled images. Labeling results from the scaled images are then fed back to enhance the original image labeling results. These naturally form a global optimization problem under scale-space condition. We further propose a desirable iterative algorithm in order to run the model. The global convergence of the algorithm is proven through iterative approximation with latent optimization constraints. We have conducted extensive experiments with five widely used labeling methods on five popular image datasets. Experimental results indicate that MCHM improves labeling accuracy of the state-of-the-art image labeling approaches impressively.

  • articleNo Access

    USING EXPERT SYSTEMS FOR IMAGE UNDERSTANDING

    Expert systems and image understanding have traditionally been considered as two separate application fields of artificial intelligence (AI). In this paper it is shown, however, that the idea of building an expert system for image understanding may be fruitful. Although this paper may serve as a framework for situating existing works on knowledge-based vision, it is not a review paper. The interested reader will therefore be referred to some recommended survey papers in the literature.

  • articleNo Access

    CONTROL STRATEGIES IN A HIERARCHICAL KNOWLEDGE STRUCTURE

    Two control strategies are presented as working on a hierarchical knowledge structure based on a semantic network. The control algorithms cover strict top-down control and a bidirectional control which is a mixture of top-down (model driven) and bottom-up (data driven) analysis. The knowledge used by the algorithm is represented in a semantic network. Besides the network some other knowledge sources may be generated automatically to direct the analysis and limit the search space. The approach was used successfully in image and speech understanding.

  • articleNo Access

    LEARNING BLACKBOARD-BASED SCHEDULING ALGORITHMS FOR COMPUTER VISION

    The goal of image understanding by computer is to identify objects in visual images and (if necessary) to determine their location and orientation. Objects are identified by comparing data extracted from images to an a priori description of the object or object class in memory. It is a generally accepted premise that, in many domains, the timely and appropriate use of knowledge can substantially reduce the complexity of matching image data to object descriptions. Because of the variety and scope of knowledge relevant to different object classes, contexts and viewing conditions, blackboard architectures are well suited to the task of selecting and applying the relevant knowledge to each situation as it is encountered.

    This paper reviews ten years of work on the UMass VISIONS system and its blackboard-based high-level component, the schema system. The schema system could interpret complex natural scenes when given carefully crafted knowledge bases describing the domain, but its application in practice was limited by the problem of model (knowledge base) acquisition. Experience with the schema system convinced us that learning techniques must be embedded in vision systems of the future to reduce or eliminate the knowledge engineering aspects of system construction.

    The Schema Learning System (SLS) is a supervised learning system for acquiring knowledge-directed object recognition (control) strategies from training images. The recognition strategies are precompiled reactive sequences of knowledge source invocations that replace the dynamic scheduler found in most blackboard systems. Each strategy is specialized to recognize instances of a specific object class within a specific context. Since the strategies are learned automatically, the knowledge base contains only general-purpose knowledge sources rather than problem-specific control heuristics or sequencing information.

  • articleNo Access

    Cooperative Spatial Reasoning for Image Understanding

    Spatial Reasoning, reasoning about spatial information (i.e. shape and spatial relations), is a crucial function of image understanding and computer vision systems. This paper proposes a novel spatial reasoning scheme for image understanding and demonstrates its utility and effectiveness in two different systems: region segmentation and aerial image understanding systems. The scheme is designed based on a so-called Multi-Agent/Cooperative Distributed Problem Solving Paradigm, where a group of intelligent agents cooperate with each other to fulfill a complicated task. The first part of the paper describes a cooperative distributed region segmentation system, where each region in an image is regarded as an agent. Starting from seed regions given at the initial stage, region agents deform their shapes dynamically so that the image is partitioned into mutually disjoint regions. The deformation of each individual region agent is realized by the snake algorithm14 and neighboring region agents cooperate with each other to find common region boundaries between them. In the latter part of the paper, we first give a brief description of the cooperative spatial reasoning method used in our aerial image understanding system SIGMA. In SIGMA, each recognized object such as a house and a road is regarded as an agent. Each agent generates hypotheses about its neighboring objects to establish spatial relations and to detect missing objects. Then, we compare its reasoning method with that used in the region segmentation system. We conclude the paper by showing further utilities of the Multi-gent/Cooperative Distributed Problem Solving Paradigm for image understanding.

  • articleNo Access

    Situated Image Understanding in a Multiagent Framework

    The paper addresses the problem of controlling situated image understanding processes. Two complementary control styles are considered and applied cooperatively, a deliberative one and a reactive one. The role of deliberative control is to account for the unpredictability of situations, by dynamically determining which strategies to pursue, based on the results obtained so far and more generally on the state of the understanding process. The role of reactive control is to account for the variability of local properties of the image by tuning operations to subimages, each one being homogeneous with respect to a given operation. A variable organization of agents is studied to face this variability. The two control modes are integrated into a unified formalism describing segmentation and interpretation activities. A feedback from high level interpretation tasks to low level segmentation tasks thus becomes possible and is exploited to recover wrong segmentations. Preliminary results in the field of liver biopsy image understanding are shown to demonstrate the potential of the approach.

  • articleNo Access

    Overcoming the Limitations of Learning-Based VQA for Counting Questions with Zero-Shot Learning

    Visual question answering (VQA) research has garnered increasing attention in recent years. It is considered a visual Turing test because it requires a computer to respond to textual questions based on an image. Expertise in computer vision, natural language processing, knowledge understanding, and reasoning is required to solve the problem of VQA. Most techniques employed for VQA consist of models that are developed to learn the combination of image and question features along with the expected answer. The techniques chosen for image and question feature extraction and combining the features change with each model. This method of teaching a model of the question–answer pattern is ineffective for queries that involve counting and reasoning. This approach also requires considerable resources and large datasets for the training. The general VQA datasets feature a restricted number of items as responses to counting questions (<10), and the distribution of the answers is not uniform. To investigate these issues in VQA, we created synthetic datasets that could be modified to adjust the number of objects in the image and the amount of occlusion. Specifically, a zero-shot learning VQA system was devised for counting-related questions that provide answers by analyzing the output of an object detector and the query keywords. Using synthetic datasets, our model generated 100% correct results. Testing on the benchmark datasets task directed image understanding challenge (TDIUC) and TallyQA-simple indicated that the proposed model matched the performance of the learning-based baseline models. This methodology can be used efficiently for counting VQA questions confined to certain domains when the number of items to be counted is significant.

  • articleNo Access

    KYDON VISION SYSTEM: THE ADAPTIVE LEARNING MODEL

    In this paper, an adaptive learning model for an autonomous vision system multi-layers architecture, called Kydon, are presented, modeled, and analyzed. In particular two critical (deletion and saturation) points on the learning curve are evaluated. These points represent two extreme states on the learning process. The Kydon architecture consists of ‘k’ layers array processors. The lowest layers consists of lower-level processing layers, and the rest consists of higher-level processing layers. The interconnectivity of the PEs in each array is based on a full hexagonal mesh structure. Kydon uses graph models to represent and process the knowledge, extracted from the image. The knowledge base of Kydon is distributed among its PE’s. A unique model for evolving knowledge base has been developed especially for Kydon in order to provide it with some intelligence properties.

  • articleNo Access

    IMAGE ENGINEERING AND RELATED PUBLICATIONS

    Image engineering is a discipline that includes image processing, image analysis, image understanding, and the applications of these techniques. To promote its development and evolvement, this paper provides a well-regulated explanation of the definition of image engineering, as well as its intention and extension. It also introduces a new classification of the theories of image engineering, and the applications of image technology. A thorough statistical survey on the publications in this discipline is carried out, and an analysis and discussion of the statistics from the classification results are presented. This work shows a general and an up-to-date picture of the status, progress, trends and application areas of image engineering.

  • articleNo Access

    CLASSIFIER COMBINATION APPLIED FOR UNDERSTANDING OF EYES IMAGES

    The article presents the development of a Classifiers Combination based on machine learning techniques (Artificial Neural Networks, Support Vector Machines and C4.5 algorithm) which were able to increase the performance achieved by the Refractive Errors Measurement System (REMS) that analyzes Hartmann-Shack (HS) images from human eyes. The HS images are analyzed in order to extract relevant data for identification of refractive errors (myopia, hypermetropia and astigmatism). Those data are extracted using Gabor wavelets transform and afterwards, machine learning techniques are employed to carry out the image analysis.

  • chapterFree Access

    CONTEXT RELATED ISSUES IN IMAGE UNDERSTANDING

    This chapter gives a formal model for scene understanding, as well as for context information; it helps in adapting image understanding procedures and software to varying contexts, when some formal assumptions are satisfied. We have defined and formalized context separation and context adaptation, which are essential for many applications, including to achieve the robustness of the understanding results in changing sensing environments. This model uses constraint logic programming and specialized models for the various interactions between the objects in the scene and the context. A comparison is made with, and examples are given of, the context models in more classical frameworks such as multilevel understanding structures, object based design in scenes, knowledge based approach, and perceptual context separation.

  • chapterNo Access

    COOPERATIVE SPATIAL REASONING FOR IMAGE UNDERSTANDING

    Spatial Reasoning, reasoning about spatial information (i.e. shape and spatial relations), is a crucial function of image understanding and computer vision systems. This paper proposes a novel spatial reasoning scheme for image understanding and demonstrates its utility and effectiveness in two different systems: region segmentation and aerial image understanding systems. The scheme is designed based on a so-called Multi-Agent/Cooperative Distributed Problem Solving Paradigm, where a group of intelligent agents cooperate with each other to fulfill a complicated task. The first part of the paper describes a cooperative distributed region segmentation system, where each region in an image is regarded as an agent. Starting from seed regions given at the initial stage, region agents deform their shapes dynamically so that the image is partitioned into mutually disjoint regions. The deformation of each individual region agent is realized by the snake algorithm14 and neighboring region agents cooperate with each other to find common region boundaries between them. In the latter part of the paper, we first give a brief description of the cooperative spatial reasoning method used in our aerial image understanding system SIGMA. In SIGMA, each recognized object such as a house and a road is regarded as an agent. Each agent generates hypotheses about its neighboring objects to establish spatial relations and to detect missing objects. Then, we compare its reasoning method with that used in the region segmentation system. We conclude the paper by showing further utilities of the Multi-Agent/Cooperative Distributed Problem Solving Paradigm for image understanding.

  • chapterFree Access

    CONTEXT RELATED ISSUES IN IMAGE UNDERSTANDING

    This chapter gives a formal model for scene understanding, as well as for context information; it helps in adapting image understanding procedures and software to varying contexts, when some formal assumptions are satisfied. We have defined and formalized context separation and context adaptation, which are essential for many applications, including to achieve the robustness of the understanding results in changing sensing environments. This model uses constraint logic programming and specialized models for the various interactions between the objects in the scene and the context. A comparison is made with, and examples are given of, the context models in more classical frameworks such as multilevel understanding structures, object based design in scenes, knowledge based approach, and perceptual context separation.