The World Wide Web and the Internet are signs that things will be very different in the future. And what is so striking about this computer-age future is that it comes incredibly fast and is incredibly overwhelming. Anyone who has surfed the Web has exclaimed at one point or another that there is so much information available, so much to search and so much to keep up with.
Where Lycos and AltaVista are already accepted tools for textual information, image and multimedia search engines are the natural answers in the quest for pictorial information. This book provides a state-of-the-art description of that field. It contains the proceedings of a valuable workshop in Amsterdam, where people gathered to discuss the progress in the field. The topics cover computational methods of searching for pictures, the powerful pictorial clues in the recognition of objects, storage and indexing of objects in a database, and, ways to access the requested pictorial information.
https://doi.org/10.1142/9789812797988_fmatter
The following sections are included:
https://doi.org/10.1142/9789812797988_0001
The following sections are included:
https://doi.org/10.1142/9789812797988_0002
The availability of large image databases is emphasizing the relevance of visual filters, which permit to focus search on a small subset of data with characterizing visual features. Visual specification of such filters provides a natural way to express content-oriented queries. An original visual language is proposed for the symbolic representation of the semantics induced by the color quality and arrangement over a painting. The proposed language is based on the theory of color semantics introduced by artists in the twentieth century and is developed so as to permit visual querying. The grammar and the visual representation of the language are presented, and its implementation is discussed with reference to a prototype system supporting retrieval by color contents of artistic paintings.
https://doi.org/10.1142/9789812797988_0003
In this paper, our goal is to analyze and evaluate various color features to be used for the purpose of image retrieval by color-metric histogram matching independent of varying imaging conditions.
In theory, we will examine the effect of a change in camera viewpoint, object surface orientation, illumination intensity, and illumination color for the various color features under a simple reflection model.
To evaluate photometric color invariance in practice, experiments have been carried out on a database consisting of 500 images taken from multicolored man-made objects in real world scenes. Images in the database show a considerable amount of noise, shadows, shading and specularities resulting in a good representation of views from everyday life as it appears in home video and consumer photography in general.
Finally, using the color invariant models, we describe an image search system by which visual information is automatically collected, indexed and cataloged entirely on the basis of the pictorial content. The system allows for fast online image search by combining: (1) visual browsing through the precomputed image catalogue; (2) query by pictorial example; (3) query by image features. The system shows high image retrieval accuracy and fast image search.
https://doi.org/10.1142/9789812797988_0004
Two significant problems in content based retrieval methods are (1) Accuracy: most of the current content based image retrieval methods have not been quantitatively compared nor benchmarked with respect to accuracy and (2) Efficiency: image database search methods must be analyzed for their computational efficiency and interrelationships. We assert that the accuracy problem is due to the generality of the applications involved. In the current systems, the goal of the user is not clear, which results in difficulties in creating ground truth. In this paper, we quantitatively compare and evaluate four fundamentally different methods for image copy location, namely, optimal keys, texture, projection, and template methods in a large portrait database. We discuss some important theoretical interrelationships, computational efficiency, and accuracy with respect to real noise experiments.
https://doi.org/10.1142/9789812797988_0005
We introduce a very fast and reliable method for shape similarity retrieval in large image databases which is robust with respect to noise, scale and orientation changes of the objects. The maxima of curvature zero-crossing contours of Curvature Scale Space (CSS) image are used to represent the shapes of object boundary contours. While a complex boundary is represented by about five pairs of integer values, an effective indexing method based on the aspect ratio of the CSS image, eccentricity and circularity is used to narrow down the range of searching. Since the matching algorithm has been designed to use global information, it is sensitive to major occlusion, but some minor occlusion will not cause any problems.
We have tested and evaluated our method on a prototype database of 450 images of marine animals with a vast variety of shapes with very good results. The method can either be used in real applications or produce a reliable shape description for more complicated images when other features such as color and texture should also be considered.
Since shape similarity is a subjective issue, in order to evaluate the method, we asked a number of volunteers to perform similarity retrieval based on shape on a randomly selected small database. We then compared the results of this experiment to the outputs of our system to the same queries and on the same database. The comparison indicated a promising performance of the system.
https://doi.org/10.1142/9789812797988_0006
The availability of large image databases and retrieval by content has imposed the requirement for indexing procedures to allow a fast pruning of the database items. Indexing of shapes is particularly challenging owing to the difficulty to derive a similarity measure that supports clustering of shapes according to human perceptual similarity. In this paper, we present a technique which exploits a multi-scale analysis of shapes, to derive a hierarchical shape representation in which shape details are progressively filtered out while shape characterizing elements are preserved. To provide the necessary degree of robustness with respect to shape variability, fuzzy sets have been used to describe the visual appearance of shape parts. A graph-like index structure is derived by clustering shapes sharing similar part descriptions. Results of indexing for a sample database are reported, with efficiency and effectiveness measures.
https://doi.org/10.1142/9789812797988_0007
A method is presented for processing pictorial query specifications that consist of a query image and a similarity level that must hold between the query image and database images. The similarity level specifies the contextual similarity (how well does the content of one image match that of another) as well as the spatial similarity (the relative locations of the matching symbols in the two images). This method allows more than one instance of each object in the database image (while still requiring only one instance of each object in the query image). The algorithm tries to satisfy the contextual similarity first and then tries to satisfy the spatial constraints using an auxiliary graph data structure. The running time of this method is exponential in the number of objects in the query image.
https://doi.org/10.1142/9789812797988_0008
We consider the construction of hypertext for documents containing a mixture of text and figures to provide convenient access to the content of both the text and the figures. The focus in this paper is on detecting alphanumeric labels in the figures to be used as anchors in the hypertext construction.
https://doi.org/10.1142/9789812797988_0009
Compact video representation is an important step when developing tools for search through large video data bases. It has been shown in many publications that the usage of representative video frames (key frames) for this purpose is indeed an appropriate way of preserving the entire temporal information flow of the sequence in a considerably smaller amount of data - if the key frame set is obtained properly. In this paper we describe a novel method for key frame based video representation. The main advantage of this approach is that the resulting set of key frames, as opposed to recent methods from literature, is not dependent on subjective thresholds or any other manually given parameters. It gives a key frame set based on “objective” model for the video information flow. Another advantage is that this key frame set contains not more than the maximal number of frames, which is set beforehand for the entire sequence.
https://doi.org/10.1142/9789812797988_0010
This paper describes an open hypermedia system which allows content based navigation and retrieval from both text and non-text media. The system provides a variety of link types including the generic link which, once authored from a particular selection, may be followed from any instance matching that selection. The paper also describes the concept of a multimedia thesaurus and illustrates how it may be used to enhance the navigational capabilities of the system, allowing links to be authored on one representation of an object and followed from a different representation.
https://doi.org/10.1142/9789812797988_0011
We describe FRANK, an application that we have developed for remote browsing of film archives. FRANK uses alternate representations of video in the form of text, image, or structured data that have a time-based mapping. We describe the motivation for the project, the work processes of the users, and the operation of the resulting prototype. Finally we suggest some techniques for deriving timecodes in the absence of time-stamped alternate representations.
https://doi.org/10.1142/9789812797988_0012
An experimental result in image indexing using GEP-2D strings is reported. Each 2D image is modelled as a generalized extended pseudo-symbolic picture, which has the GEP-2D string representation. In [10] it is proven that there is an efficient algorithm to generate the GEP-2D string representation of a 2D image. In this paper we describe the implementation of the GEP-2D string indexing algorithm using C language on a SUN Sparc workstation and the timing result, which empirically verified the theoretical time complexity upperbound of O(N3) for this algorithm, where N is the number of all objects involved in a picture.
https://doi.org/10.1142/9789812797988_0013
An efficient shape-based indexing framework for content-based image retrieval is described. Rather than directly comparing a candidate shape with all entries in a database, shapes are ordered in terms of nonrigid deformations that relate them to a small subset of representative prototypes. The framework employs modal matching, a deformable shape decomposition that allows for automatic shape correspondence computation. In the modal representation, shape is decomposed into an ordered basis of orthogonal principal deformations. This allows selective invariance to in-plane rotation, translation, and scaling, and quasi-invariance to affine deformations. Retrieval accuracy and stability have been evaluated in experiments with 2D image databases. Experiments indicate that the method offers significantly better performance over moment-based invariants.
https://doi.org/10.1142/9789812797988_0014
We propose a Modified Fourier Descriptor and a new distance measure for describing and comparing closed planar curves. Our method accounts for spatial discretization of shapes, an issue seldom mentioned, much less addressed in the literature.
The motivating application is shape matching in the Multimedia Analysis and Retrieval System (MARS), our content-based image retrieval system. The application requires a compact and reliable representation of object boundaries in the image database, and a similarity measure that can be computed in real time. We test our shape matching method on a set of Roman characters. Results indicate that our method is a feasible solution for real time shape comparison.
https://doi.org/10.1142/9789812797988_0015
Video retrieval is an important topic in video database systems. Annotation-based video retrieval is a natural and an effective means for accessing video data when higher level semantics are involved, and is complementary to content-based access. In the context of MOOVIS - an experimental video database system that we have implemented, both context-oriented and feature-based meta data descriptions are organized in a structured yet flexible manner, which can facilitate to accommodate user queries that are either context-free or context-dependent. An associated set of query language facilities is currently being developed, and is to be incorporated into MOOVIS on the PC Pentium platform.
https://doi.org/10.1142/9789812797988_0016
In this paper we describe methods using 2D-pixel motion fields for detecting shot breaks and classifying camera motion in digital video. These flow fields can be calculated with optical flow or correlation matching. The Karhunen-Loeve transform (KLT), based on statistical properties, insures extraction of optimal linear features from flow fields representing a particular motion class. Two methods for classification are discussed, and experimental results for each method are presented.
https://doi.org/10.1142/9789812797988_0017
The problem of finding similar images and images with a similar sub-image given a search image or part of a search image is addressed in this article by using 3×3 local pattern statistics in thresholded gradient images of scanned portraits normalized for scale, orientation, position and lighting. Pattern frequencies sorted on magnitude are closely linked to a spatial frequency ordering. The 2D trigram feature vector is formed by a weighing of the pattern frequencies. Several weighing functions were applied to a database of 3014 image files: the results indicate that a sort of band pass approach, suppressing both ends of the sorted pattern frequency distribution works best. The method works well on full image comparison and will be applied to sub-image search in the near future.
https://doi.org/10.1142/9789812797988_0018
We propose a two-stage polygon representation for polygon shape matching in image databases. The first stage performs qualitative measure of shape by using the Binary String Descriptor to quickly find equivalent classes of polygons. The second stage performs quantitative measure of shape by using a Multi-Resolution Area Matching which operates on the subset of shapes belonging to the same equivalent class by a coarse-to-fine area matching strategy. We describe these techniques and demonstrate how this two-stage representation works for a simple shape image database.
https://doi.org/10.1142/9789812797988_0019
In this work we present the application of modal matching to visual search in an image database by means of a user-defined sketch. Similarity evaluation for scarcely sampled shapes is outlined, as well as the problems related to modal matching between differently sampled objects. Four different definitions of possible similarity indexes are presented and discussed.
https://doi.org/10.1142/9789812797988_0020
We present a new Image Indexing method and an implemented system to perform Content Based retrieval in heterogeneous Image Databases. The method is based on the Theory of Iterated Function Systems (IFS), already exploited for image compression, and it uses a training set of images to build the set of indexing features. The index structure is represented through a reduced vector of features on which we use R-Tree to efficiently access images in the database.
https://doi.org/10.1142/9789812797988_0021
The large amount of available multimedia information (e.g. videos, audio, images) requires efficient and effective annotation and retrieval methods.
The System ImageMiner, which was developed at the University of Bremen in the AI group, is designed for content-based retrieval of single images.
As videos become a more important role in the frame of multimedia, we want to make videos available for ImageMiner. The first step is the detection and extraction of shots from a video using a histogram based method. The second step is to combine the images in a shot to a single image. This image describes the shot (Mosaicing-technique) and can be analyzed with ImageMiner.
https://doi.org/10.1142/9789812797988_0022
The problem of image identification by content from an image database is viewed as a search space reduction problem where different paradigms and approaches are employed to progressively reduce the search space. Unlike information recovery, such reduction cannot in general be used to pinpoint exactly all and only those images that are needed by a query. In this paper, three main search space reduction strategies are evaluated: high-level descriptive approach, low-level signatures comparison, and user-level picture keys. The effectiveness of these approaches are studied and quantified. Based on these strategies, it is possible that highly effective query models may be built. In combination, they offer the potential to drastically reduce the search space for image content identification through the use of ranking and relevance feedback techniques.
https://doi.org/10.1142/9789812797988_0023
In this paper, we present an image retrieval system based on the content. The content of images include both low level features such as colors, textures, and high level features such as spatial constraints and shapes of relevant regions. Based on object technology, the image features and behaviors are modeled and stored in a database. Images can be retrieved by examples (show me images similar to this image) or by selecting properties from pickers such as a sketched shape, a color histogram, a spatial constraint interface, a list of key words and a combination of these. The integration of high and low level features in the object-oriented database is an important property of our work.
https://doi.org/10.1142/9789812797988_0024
We propose an image retrieval system by which an ambiguous target image can be retrieved The advantages of this system were examined using a facial database. An ambiguous image, such as a favorite face, was chosen as the target image. In the system, similar faces to the prototype, which was created by repeated retrieval over some period of time, were used to estimate the goal image. One of the main aims was to evaluate the method's ability to lead the retriever to the goal image without measuring the psychological preference of the retriever directly. It was found that ambiguous facial image retrieval is possible without subjective labels in the data, but with the presentation of faces similar to the prototype for the next candidates.
https://doi.org/10.1142/9789812797988_0025
This paper describes the use of viewpoint-invariant representations for shape-based retrieval in image databases. In particular, we focus on a class of man-made 3D objects, whose planar surfaces contain some distinctive patterns. For this class of objects we introduce a representation scheme for parametric curves, whose reference frame is constructed using consecutive sets of points that are invariant to affine and projective transformations. In particular, we employ intersections between line segments, bitangents, and cusp tangents. In order to reduce the computational complexity of this representation, we propose an ordering algorithm for the invariant points, which reduces the number of reference frames for each curve to the number of invariant points. We then propose a two-step retrieval method. First, an indexing procedure compares the query shape to all shapes of the database using only the first-order moments of the invariant points. Once a small set of candidate shapes has been selected, the Euclidean distance between the two curves in the invariant reference frame is used to compute a detailed similarity measure. Experimental results are reported for a database of trademark patterns subject to affine transformations.
https://doi.org/10.1142/9789812797988_0026
Quality, efficiency, and scalability are the key issues in the design of image retrieval systems for large image databases. Although the quality of image retrieval methods still depends strongly on the application domain, color-based retrieval techniques have been shown to be competitive and generally applicable. In this paper we describe several algorithms to improve the retrieval process by combining spatial information with color features. The algorithms are embedded in an extensible database system, which provides for the efficiency and scalability towards tens of thousand images required in practice.
https://doi.org/10.1142/9789812797988_0027
The most intuitive way of retrieving images is on the basis of domain concepts. However, this requires a mapping between the concepts and the content of the image. Such a mapping should be based on a proper visual guideline. We illustrate this with plant variety testing as an application for which such guidelines are available. The methods seem to have general applicability for every application domain where such guidelines can be made.
https://doi.org/10.1142/9789812797988_0028
This article describes a software architecture for building image retrieval applications. Its primary objective is to facilitate application development through the use of detector based image processing and integration of image processing and database technology.
An application is built by combining detector objects from a standard set of low-level detectors, which use the database for efficient storage and retrieval of the images and the spatial and photometric properties of elements detected in the image.
Furthermore, the use of a database facilitates data sharing and data reuse by different applications.