How to obtain discriminative features has proved to be a core problem for image retrieval. Many recent works use convolutional neural networks to extract features. However, clutter and occlusion will interfere with the distinguishability of features when using convolutional neural network (CNN) for feature extraction. To address this problem, we intend to obtain high-response activations in the feature map based on the attention mechanism. We propose two attention modules, a spatial attention module and a channel attention module. For the spatial attention module, we first capture the global information and model the relation between channels as a region evaluator, which evaluates and assigns new weights to local features. For the channel attention module, we use a vector with trainable parameters to weight the importance of each feature map. The two attention modules are cascaded to adjust the weight distribution for the feature map, which makes the extracted features more discriminative. Furthermore, we present a scale and mask scheme to scale the major components and filter out the meaningless local features. This scheme can reduce the disadvantages of the various scales of the major components in images by applying multiple scale filters, and filter out the redundant features with the MAX-Mask. Exhaustive experiments demonstrate that the two attention modules are complementary to improve performance, and our network with the three modules outperforms the state-of-the-art methods on four well-known image retrieval datasets.
Systems for the archival and retrieval of images are used in many areas, for example medical applications or news agencies. The technique of dynamic image analysis and comparison enables a detailed search for important image elements such as persons and objects. But it also requires large computational resources. Therefore, we developed a cluster-based architecture for an efficient storage and comparison of archived images. The initially even, size-based distribution of images over the cluster nodes is distorted, when the user excludes images from further consideration by applying combined a-priori and dynamically extracted features. This creates the necessity of utilisation of workload balancing strategies, which consider the specific requirements of the image retrieval problem. For efficiently solving this problem we developed a formal problem specification and three strategies – LPTR, RBS and Simulated Annealing – for workload balancing. The reduction of the system response time due to successful load balancing is evaluated by series of experimental measurements.
Image retrieval based on content not only relies heavily upon the type of descriptors, but on the steps taken further. This has been an extensively utilized methodology for finding and fetching out images from the big database of images. Nowadays, a number of methodologies have been organized to increase the CBIR performance. This has an ability to recover pictures relying upon their graphical information. In the proposed method, Neuro-Fuzzy classifier and Deep Neural Network classifier are used to classify the pictures from a given dataset. The proposed approach obtained the highest accuracy in terms of Precision, Recall, and F-measure. To show the efficiency and effectiveness of proposed approach, statistical testing is used in terms of standard deviation, skewness, and kurtosis. The results reveal that the proposed algorithm outperforms other approaches using low computational efforts.
Orthogonal Fourier–Mellin (OFM) moments have better feature representation capabilities, and are more robust to image noise than the conventional Zernike moments and pseudo-Zernike moments. However, OFM moments have not been extensively used as feature descriptors since they do not possess scale invariance. This paper discusses the drawbacks of the existing methods of extracting OFM moments, and proposes an improved OFM moments. A part of the theory, which proves the improved OFM moments possesses invariance of rotation and scale, is given. The performance of the improved OFM moments is experimentally examined using trademark images, and the invariance of the improved OFM moments is shown to have been greatly improved over the current methods.
In this paper, an image feature of color differences on edges in spiral scan order (CDESSO) is presented. This proposed CDESSO feature can characterize the principal pixel colors, color complexity and color differences among adjacent objects in an image. In addition, this paper employs the CDESSO feature to develop an image retrieval system. The CDESSO-based image retrieval system can provide a high accuracy rate in finding the database images that satisfy the users' requirement. Besides, it can also resist the scale variants of images as well as the shift and rotation variants of objects in images.
In this paper, a coarse-to-fine hierarchical classification method based on the features derived from adaptive cellular color decomposition is proposed. The proposed method is general and can be applied to all kinds of color image databases as long as a sample set of images have been classified. In addition, the number of classes can be as versatile as required. To achieve the goal mentioned above, our method consists of two phases: color quantization and classification. In the color quantization step, cellular decomposition is used to adaptively quantize color images in the HSV color space since H and S components construct a hexagon structure that is same as the cellular pattern. In the classification step, a coarse-to-fine strategy is employed. In the coarse stage, five image-based features extracted directly from the quantization results of the query images are used to prune irrelevant database images. In the fine stage, two cluster-based features are extracted from a small set of candidate images using closest-cluster matching. On the other hand, according to feature evaluation, one image-based and two cluster-based features are selected to derive individual-based similarity measure, which, in turn, is used to measure image-to-image similarity. In addition, class-based similarity measure using class characteristics is proposed to evaluate image-to-class similarity. Candidate images are then sorted according to the similarity measure, which is a combination of individual-based and class-based similarity measures. Finally, k-NN rule is used to assign the query image to a single class according to the sorting results. The effectiveness and practicability of the proposed method have been demonstrated by various experimental results.
Image retrieval plays an important role in a broad spectrum of applications. Contentbased retrieval (CBR) is one of the popular choices in many biomedical and industrial applications. Discrete image transforms have been widely studied and suggested for many image retrieval applications. The Discrete Wavelet Transform (DWT) is one of the most popular transforms recently applied to many image processing applications. The Daubechies wavelet can be used to form the basis for extracting features in retrieving images based on the description of a particular object within the scene. This wavelet is widely used for image compression. In this paper we highlight the common features between compression and retrieval. Several examples are used to test the DWT retrieval system. A comparison between DWT and Discrete Cosine Transform (DCT) is also made. The retrieval system using DWT requires preprocessing and normalization of images, which might slow down the retrieval process. The accuracy of the retrieval using DWT has been significantly improved by incorporating efficient K-Neighbor Nearest Distance (KNND) measure in our system.
A flexible description of images is offered by a cloud of points in a feature space. In the context of image retrieval such clouds can be represented in a number of ways. Two approaches are here considered. The first approach is based on the assumption of a normal distribution, hence homogeneous clouds, while the second one focuses on the boundary description, which is more suitable for multimodal clouds. The images are then compared either by using the Mahalanobis distance or by the support vector data description (SVDD), respectively.
The paper investigates some possibilities of combining the image clouds based on the idea that responses of several cloud descriptions may convey a pattern, specific for semantically similar images. A ranking of image dissimilarities is used as a comparison for two image databases targeting image classification and retrieval problems. We show that combining of the SVDD descriptions improves the retrieval performance with respect to ranking, on the contrary to the Mahalanobis case. Surprisingly, it turns out that the ranking of the Mahalanobis distances works well also for inhomogeneous images.
In this paper a new set of descriptors appropriate for image indexing and retrieval is proposed. The proposed descriptors address the tremendously increased need for efficient content-based image retrieval (CBIR) in many application areas such as the Internet, biomedicine, commerce and education. These applications commonly store image information in large image databases where the image information cannot be accessed or used unless the database is organized to allow efficient storage, browsing and retrieval. To be applicable in the design of large image databases, the proposed descriptors are compact, with the smallest requiring only 23 bytes per image. The proposed descriptors' structure combines color and texture information which are extracted using fuzzy approaches. To evaluate the performance of the proposed descriptors, the objective Average Normalized Modified Retrieval Rank (ANMRR) is used. Experiments conducted on five benchmarking image databases demonstrate the effectiveness of the proposed descriptors in outperforming other state-of-the-art descriptors. Also, a Auto Relevance Feedback (ARF) technique is introduced which is based on the proposed descriptors. This technique readjusts the initial retrieval results based on user preferences improving the retrieval score significantly. An online demo of the image retrieval system img(Anaktisi) that implements the proposed descriptors can be found at .
We consider the problem of enriching the travelogue associated with a small number (even one) of images with more web images. Images associated with the travelogue always consist of the content and the style of textual information. Relying on this assumption, in this paper, we present a framework of travelogue enriching, exploiting both textual and visual information generated by different users. The framework aims to select the most relevant images from automatically collected candidate image set to enrich the given travelogue, and form a comprehensive overview of the scenic spot. To do these, we propose to build two-layer probabilistic models, i.e. a text-layer model and image-layer models, on offline collected travelogues and images. Each topic (e.g. Sea, Mountain, Historical Sites) in the text-layer model is followed by an image-layer model with sub-topics learnt (e.g. the topic of sea is with the sub-topic like beach, tree, sunrise and sunset). Based on the model, we develop strategies to enrich travelogues in the following steps: (1) remove noisy names of scenic spots from travelogues; (2) generate queries to automatically gather candidate image set; (3) select images to enrich the travelogue; and (4) choose images to portray the visual content of a scenic spot. Experimental results on Chinese travelogues demonstrate the potential of the proposed approach on tasks of travelogue enrichment and the corresponding scenic spot illustration.
This paper introduces a family of rectangularity measures. The measures depend on two parameters which enable their flexibility, i.e. the possibility to adapt with respect to a concrete application. Several rectangularity measures exist in the literature, and they are designed to evaluate numerically how much the shape considered differs from a perfect rectangle. None of these measures distinguishes rectangles whose edge ratios differ, i.e. they assume that all rectangles (including squares) have the same shape. Such property can be a disadvantage in applications. In this paper, we consider differently elongated rectangles to have different shapes, and propose a family of new rectangularity measures which assigns different values to rectangles whose edge ratios differ. The new rectangularity measures are invariant with respect to translation, rotation and scaling transformations. They range over the interval ]0, 1] and attain the value 1 only for perfect rectangles with a desired edge ratio.
This article introduces a partial matching framework, based on set theory criteria, for the measurement of shape similarity. The matching framework is described in an abstract way because the proposed scheme is independent of the selection of a segmentation method and feature space. This paradigm ensures the high adaptability of the algorithm and brings the implementer a wide control over the robustness, the ability to balance between selectivity and sensitivity, and the freedom to deal with more general and arbitrary image transformations required for some particular problem. A strategy to establish a descriptor set obtained from components segmented from the main shape is expounded, and two exclusion measure functions are formulated. Proofs are given to show that it is not required to match the entire descriptor sets to determine that two shapes are similar. The methodology provides a dissimilarity score that may be used for shape-based retrieval and object recognition; this is demonstrated applying the proposed approach in a cattle brand identification system.
The usefulness of an image database depends on whether the image of interest can be easily located. Feature extraction is a crucial step of image retrieval. The well-known SIFT descriptor is a keypoint-based image feature. It can be used to robustly find the same objects in different images, i.e. achieve the object recognition task. However, it is not effective on the image retrieval task, i.e. finding images with similar content. To improve the SIFT algorithm, we propose a robust image retrieval algorithm based on the integration of keypoints and edges information. Our approach is robust to translation, rotation and partial occlusion of the object. Experimental results indicate that the proposed algorithm is effective and outperforms the SIFT algorithm.
For the limited transmission capacity and compressed images in the network environment, a compressed-domain image filtering and re-ranking approach for multi-agent image retrieval is proposed in this paper. Firstly, the distributed image retrieval platform with multi-agent is constructed by using Aglet development system, the lifecycle and the migration mechanism of agent is designed and planned for multi-agent image retrieval by using the characteristics of mobile agent. Then, considering the redundant image brought by distributed multi-agent retrieval, the duplicate images in distributed retrieval results are filtered based on the perceptual hashing feature extracted in the compressed-domain. Finally, weight-based hamming distance is utilized to re-rank the retrieval results. The experimental results show that the proposed approach can effectively filter the duplicate images in distributed image retrieval results as well as improve the accuracy and speed of compressed-domain image retrieval.
Textual information embedded in multimedia can provide a vital tool for indexing and retrieval. Text extraction process has many inherent problems due to the variation in font sizes, color, backgrounds and resolution. Text detection and localization are the most challenging phases of text extraction process whereas text extraction results are highly dependent upon these phases. This paper focuses on the text localization because of its very fundamental importance. Two effective feature vectors are introduced for the classification of the text and nontext objects. First feature vector is represented by the Radon transform of text candidate objects. Second feature vector is derived from the detailed geometrical analysis of text contents. Union of two feature vectors is used for the classification of text and nontext objects using support vector machine (SVM). Text detection and localization results are evaluated on two publicly available datasets namely ICDAR 2013 and IPC-Artificial text. Moreover, results are compared with state-of-the-art techniques and the Comparison demonstrates the superiority of the presented research.
Keyword-based image retrieval is more comfortable for users than content-based image retrieval. Because of the lack of semantic description of images, image annotation is often used a priori by learning the association between the semantic concepts (keywords) and the images (or image regions). This association issue is particularly difficult but interesting because it can be used for annotating images but also for multimodal image retrieval. However, most of the association models are unidirectional, from image to keywords. In addition to that, existing models rely on a fixed image database and prior knowledge. In this paper, we propose an original association model, which provides image-keyword bidirectional transformation. Based on the state-of-the-art Bag of Words model dealing with image representation, including a strategy of interactive incremental learning, our model works well with a zero-or-weak-knowledge image database and evolving from it. Some objective quantitative and qualitative evaluations of the model are proposed, in order to highlight the relevance of the method.
Image retrieval methods have been dramatically developed in the last decade. In this paper, we propose a novel method for image retrieval based on manifold structures of canonical images. Firstly, we present the image normalization process to find a set of canonical images that anchors the probabilistic distributions around the real data manifolds to learn the representations that better encode the manifold structures in general high-dimensional image space. In addition, we employ the canonical images as the centers of the conditional multivariate Gaussian distributions. This approach allows to learn more detailed structures of the partial manifolds resulting in improved representation of the high level properties of scene images. Furthermore, we use the probabilistic framework of the extended model to retrieve images based on the similarity measure of reciprocal likelihood of pairs of images and the sum of likelihood of one of two images based on the other’s best distributions. We estimate our method using SUN database. In the experiments on scene image retrieval, the proposed method is efficient, and exhibits superior capabilities compared to other methods, such as GIST.
This paper proposed a high-performance image retrieval framework, which combines the improved feature extraction algorithm SIFT (Scale Invariant Feature Transform), improved feature matching, improved feature coding Fisher and improved Gaussian Mixture Model (GMM) for image retrieval. Aiming at the problem of slow convergence of traditional GMM algorithm, an improved GMM is proposed. This algorithm initializes the GMM by using on-line K-means clustering method, which improves the convergence speed of the algorithm. At the same time, when the model is updated, the storage space is saved through the improvement of the criteria for matching rules and generating new Gaussian distributions. Aiming at the problem that the dimension of SIFT (Scale Invariant Feature Transform) algorithm is too high, the matching speed is too slow and the matching rate is low, an improved SIFT algorithm is proposed, which preserves the advantages of SIFT algorithm in fuzzy, compression, rotation and scaling invariance advantages, and improves the matching speed, the correct match rate is increased by an average of 40% to 55%. Experiments on a recently released VOC 2012 database and a database of 20 category objects containing 230,800 images showed that the framework had high precision and recall rates and less query time. Compared with the standard image retrieval framework, the improved image retrieval framework can detect the moving target quickly and effectively and has better robustness.
Human visual perception has a close relationship with the HSV color space, which can be represented as a cylinder. The question of how visual features are extracted using such an attribute is important. In this paper, a new feature descriptor; namely, a color volume histogram, is proposed for image representation and content-based image retrieval. It converts a color image from RGB color space to HSV color space and then uniformly quantizes it into 72 bins of color cues and 32 bins of edge cues. Finally, color volumes are used to represent the image content. The proposed algorithm is extensively tested on two Corel datasets containing 15000 natural images. These image retrieval experiments show that the color volume histogram has the power to describe color, texture, shape and spatial features and performs significantly better than the local binary pattern histogram and multi-texton histogram approaches.
Due to complicated data structure, image can present rich information, and so images are applied widely at different fields. Although the image can offer a lot of convenience, handling such data consume much time and multi-dimensional space. Especially when users need to retrieve some images from larger-scale image datasets, the disadvantage is more obvious. So, in order to retrieve larger-scale image data effectively, a scene images retrieval strategy based on the MapReduce parallel programming model is proposed. The proposed strategy first, investigates how to effectively store large-scale scene images under a Hadoop cluster parallel processing architecture. Second, a distributed feature clustering algorithm MeanShift is introduced to implement the clustering process of emotional feature of scene images. Finally, several experiments are conducted to verify the effectiveness and efficiency of the proposed strategy in terms of different aspects such as retrieval accuracy, speedup ratio and efficiency and data scalability.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.