There exist a large number of distance functions that allow one to measure similarity between feature vectors and thus can be used for ranking purposes. When multiple representations of the same object are available, distances in each representation space may be combined to produce a single similarity score. In this paper, we present a method to build such a similarity ranking out of a family of distance functions. Unlike other approaches that aim to select the best distance function for a particular context, we use several distances and combine them in a convenient way. To this end, we adopt a classical similarity learning approach and face the problem as a standard supervised machine learning task. As in most similarity learning settings, the training data are composed of a set of pairs of objects that have been labeled as similar/dissimilar. These are first used as an input to a transformation function that computes new feature vectors for each pair by using a family of distance functions in each of the available representation spaces. Then, this information is used to learn a classifier. The approach has been tested using three different repositories. Results show that the proposed method outperforms other alternative approaches in high-dimensional spaces and highlight the benefits of using multiple distances in each representation space.
Person re-identification (person re-ID) is a challenging task which aims at spotting same persons among disjoint camera views. It has certainly generated a lot of attention in the field of computer vision, but it remains a challenging task due to the complexity of person appearances from different camera views. To solve this challenging problem, many excellent methods have been proposed, especially metric learning-based algorithms. However, most of them suffer from the problem of data imbalance. To solve this problem, in the paper we proposed a new data-balanced method and named it Enhanced Metric Learning (EML) based on adaptive asymmetric and diversity regularization for person re-ID. Metric learning is important for person re-ID because it can eliminate the negative effects caused by camera differences to a certain extent. But most metric learning approaches often neglect the problem of data imbalance caused by too many negative samples but few positive samples. And they often treat all negative samples the same as positive ones, which can lead to the loss of important information. Our approach pays different attention to the positive samples and negative ones. Firstly, we classified negative samples into three groups adaptively, and then paid different attention to them using adaptive asymmetric strategy. By treating samples differently, the proposed method can better exploit the discriminative information between positive and negative samples. Furthermore, we also proposed to impose a diversity regularizer to avoid over-fitting when the training sets are small or medium-sized. Finally, we designed a series of experiments on four challenging databases (VIPeR, PRID450S, CUHK01 and GRID), to compare with some excellent metric learning methods. Experimental results show that the rank-1 matching rate of the proposed method has outperformed the state-of-the-art by 3.64%, 4.2%, 3.13% and 2.83% on the four databases, respectively.
Similarity/distance measures play a key role in many machine learning, pattern recognition, and data mining algorithms, which leads to the emergence of the metric learning field. Many metric learning algorithms learn a global distance function from data that satisfies the constraints of the problem. However, in many real-world datasets, where the discrimination power of features varies in the different regions of input space, a global metric is often unable to capture the complexity of the task. To address this challenge, local metric learning methods are proposed which learn multiple metrics across the different regions of the input space. Some advantages of these methods include high flexibility and learning a nonlinear mapping, but they typically achieve at the expense of higher time requirements and overfitting problems. To overcome these challenges, this research presents an online multiple metric learning framework. Each metric in the proposed framework is composed of a global and a local component learned simultaneously. Adding a global component to a local metric efficiently reduces the problem of overfitting. The proposed framework is also scalable with both sample size and the dimension of input data. To the best of our knowledge, this is the first local online similarity/distance learning framework based on Passive/Aggressive (PA). In addition, for scalability with the dimension of input data, Dual Random Projection (DRP) is extended for local online learning in the present work. It enables our methods to run efficiently on high-dimensional datasets while maintaining their predictive performance. The proposed framework provides a straightforward local extension to any global online similarity/distance learning algorithm based on PA. Experimental results on some challenging datasets from machine vision community confirm that the extended methods considerably enhance the performance of the related global ones without increasing the time complexity.
Obtaining discriminative features is one of the key problems in three-dimensional (3D) shape retrieval. Recently, deep metric learning-based 3D shape retrieval methods have attracted the researchers’ attention and have achieved better performance. The triplet-center loss can learn more discriminative features than traditional classification loss, and it has been successfully used in deep metric learning-based 3D shape retrieval task. However, it has a hard margin parameter that only leverages part of the training data in each mini-batch. Moreover, the margin parameter is often determined by experience and remains unchanged during the training process. To overcome the above limitations, we propose the soft margin triplet-center loss, which replaces the margin with the nonparametric soft margin. Furthermore, we combined the proposed soft margin triplet-center loss with the softmax loss to improve the training efficiency and the retrieval performance. Extensive experimental results on two popular 3D shape retrieval datasets have validated the effectiveness of the soft margin triplet-center loss, and our proposed 3D shape retrieval method has achieved better performance than other state-of-the-art method.
The recommender system predicts user preferences by mining user historical behavior data. This paper proposes a social recommendation combining trust relationship and distance metric factorization. On the one hand, the recommender system has a cold start problem, which can be effectively alleviated by adding social relations. Simultaneously, to improve the problem of sparse trust matrix, we use the Jaccard similarity coefficient and the Dijkstra algorithm to reconstruct the trust matrix and explore the potential user trust relationship. On the other hand, the traditional matrix factorization algorithm is modeled by the user item potential factor dot product, however, it does not satisfy the triangle inequality property and affects the final recommender effect. The primary motivator behind our approach is to combine the best of both worlds, mitigate the inherent weaknesses of each paradigm. Combining the advantages of the two ideas, it has been demonstrated that our algorithm can enhance recommender performance and improve cold start in recommender systems.
In response to the continuous sophistication of cyber threat actors, it is imperative to make the best use of cyber threat intelligence converted from structured or semi-structured data and Named Entity Recognition (NER) techniques that contribute to extracting critical cyber threat intelligence. To promote the NER research in Cyber Threat Intelligence (CTI) domain, we provide a Large Dataset for NER in Cyber Threat Intelligence (LDNCTI). On the LDNCTI corpus, we investigated the feasibility of mainstream transformer-based models in CTI domain. To settle the problem of unbalanced label distribution, we introduce a transformer-based model with a Triplet Loss based on metric learning and Sorted Gradient harmonizing mechanism (TSGL). Our experimental results show that the LDNCTI well represents critical threat intelligence and that our transformer-based model with the new loss function outperforms previous schemes on the Dataset for NER in Threat Intelligence (DNRTI) and the dataset for NER in Advanced Persistent Threats (APTNER).
A Discriminative Deep Forest (DisDF) as a metric learning algorithm is proposed in the paper. It is based on the Deep Forest or gcForest proposed by Zhou and Feng and can be viewed as a gcForest modification. The case of the fully supervised learning is studied when the class labels of individual training examples are known. The main idea underlying the algorithm is to assign weights to decision trees in random forest in order to reduce distances between objects from the same class and to increase them between objects from different classes. The weights are training parameters. A specific objective function which combines Euclidean and Manhattan distances and simplifies the optimization problem for training the DisDF is proposed. The numerical experiments illustrate the proposed distance metric algorithm.
Image recognition technology is an important branch of artificial intelligence research, using the computer to process, analyze and understand the image, in order to identify different patterns of objects. Image recognition technology is currently used in a wide range of applications, such as face recognition, fingerprint recognition, terrain survey, license plate recognition, etc. However, due to the possible existence of multiple categories in images, the blurring of graphic boundaries affects the results of image recognition. In this paper, we propose a multi-category multi-task image recognition method based on deep metric learning (MMDML). Specially, we combine triplet loss function and softmax loss function to construct the loss function, and take ResNet-50 network training to fit the optimal loss function for image recognition. To demonstrate the effectiveness of the proposed method, our method is compared with the other two methods on three common image recognition datasets, namely ImageNet, PASCALVOC, and Caltech. And the experimental results show that our algorithm has the highest Rank-1 and mAP on three datasets.
Distance metric learning and nonlinear dimensionality reduction are intrinsically related, since they are both different perspectives of the same fundamental problem: to learn compact and meaningful data representations for classification and visualization. In this paper, we propose a graph-based generalization of Semi-Supervised Dimensionality Reduction (SSDR) algorithm that uses stochastic distances (Kullback-Leibler, Bhattacharyya and Cauchy-Schwarz divergences) to compute the similarity between local multivariate Gaussian distributions along the K Nearest Neighbors (KNN) graph build from the samples in the input high-dimensional space. In summary, there are two variants of the proposed method: one which uses only a fraction of the labeled samples (10%) and another that also uses a clustering method (Gaussian Mixture Models) to estimate the labels of the minimum spanning tree of the KNN graph, incorporating more information into the process. Experimental results with several real datasets show that the proposed method is able to improve the classification accuracy of several supervised classifiers and also the quality of the obtained clusters (Silhouette Coefficients) in comparison to the regular SSDR algorithm, making it a viable alternative for pattern classification problems.
Multi-modal metric learning has recently received considerable attention since many real-world applications involve multi-modal data. However, there is relatively little study on the generalization analysis of the associated learning algorithms. In this paper, we bridge this theoretical gap by deriving its generalization bounds using Rademacher complexities. In particular, we establish a general Rademacher complexity result by systematically analyzing the behavior of the resulting models with various regularizers, e.g., ℓp-regularizer on the modality level with either a mixed (q,s)-norm or a Schatten norm on each modality. Our results and the discussion followed help to understand how the prior knowledge can be exploited by selecting an appropriate regularizer.
A new adaptive weighted deep forest algorithm which can be viewed as a modification of the confidence screening mechanism is proposed. The main idea underlying the algorithm is based on adaptive weigting of every training instance at each cascade level of the deep forest. The confidence screening mechanism for the deep forest proposed by Pang et al., strictly removes instances from training and testing processes to simplify the whole algorithm in accordance with the obtained random forest class probability distributions. This strict removal may lead to a very small number of training instances at the next levels of the deep forest cascade. The presented modification is more flexible and assigns weights to instances in order to differentiate their use in building decision trees at every level of the deep forest cascade. It overcomes the main disadvantage of the confidence screening mechanism. The proposed modification is similar to the AdaBoost algorithm to some extent. Numerical experiments illustrate the outperformance of the proposed modification in comparison with the original deep forest. It is also illustrated how the proposed algorithm can be extended for solving the transfer learning and distance metric learning problems.
We present a distance metric learning algorithm for regression problems, which incorporates label information to form a biased distance metric in the process of learning. We use Newton's optimization method to solve an optimization problem for the sake of learning this biased distance metric. Experiments show that this method can find the intrinsic variation trend of data in a regression model by a relative small amount of samples without any prior assumption of the structure or distribution of data. In addition, the test sample data can be projected to this metric by a simple linear transformation and it is easy to be combined with manifold learning algorithms to improve the performance. Experiments are conducted on the FG-NET aging database, the UIUC-IFP-Y aging database, and the CHIL head pose database by Gaussian process regression based on the learned metric, which shows that our method is competitive among the start-of-art.
Metric learning is a critical problem in classification. Most classifiers are based on a metric, the simplest one is the KNN classifier, whose outcome is directly decided by the given metric. This paper will discuss semi-supervised metric learning. Most traditional semi-supervised metric learning algorithms preserve the local structure of all the samples (including labeled and unlabeled) in the input space, when making the same labeled samples together and separating different labeled samples. In most existing methods, the local structure is calculated by the Euclidean distance which uses all the features. As we all know, high dimensional data lies on a low dimension manifold, and not all the features are discriminative. Thus, in this paper, we try to explore the latent structure of the samples and use the more discriminative features to calculate the local structure. The latent structure is learned by clustering random forest and cast into similarity between samples. Based on the hierarchical structure of the trees and the split function, the similarity is obtained from discriminant features. Experimental results on public data sets show our algorithm outperforms the traditional similar related algorithms.
Few-shot learning can solve new learning tasks in the condition of fewer samples. However, currently, the few-shot learning algorithms mostly use the ResNet as a backbone, which leads to a large number of model parameters. To deal with the problem, a lightweight backbone named DenseAttentionNet which is based on the Convolutional Block Attention Mechanism is proposed by comparing the parameter amount and the accuracy of few-shot classification with ResNet-12. Then, based on the DenseAttentionNet, a few-shot learning algorithm called Meta-DenseAttention is presented to balance the model parameters and the classification effect. The dense connection and attention mechanism are combined to meet the requirements of fewer parameters and to achieve a good classification effect for the first time. The experimental results show that the DenseAttentionNet, not only reduces the number of parameters by 55% but also outperforms other classic backbones in the classification effect compared with the ResNet-12 benchmark. In addition, Meta-DenseAttention has an accuracy of 56.57% (5way-1shot) and 72.73% (5way-5shot) on the miniImageNet, although the number of parameters is only 3.6M. The experimental results also show that the few-shot learning algorithm proposed in this paper not only guarantees classification accuracy but also has the characteristics of lightweight.
Fusion techniques with the aim to leverage the discriminative power of different appearance features for person representation have been widely applied in person re-identification. They are performed by concatenating all feature vectors (known as early fusion) or by combining matching scores of different classifiers (known as late fusion). Previous studies have proved that late fusion techniques achieve better results than early fusion ones. However, majority of the studies focus on determining the suitable weighting schemes that can reflect the role of each feature. The determined weights are then integrated in conventional similarity functions, such as Cosine [L. Zheng, S. Wang, L. Tian, F. He, Z. Liu and Q. Tian, Queryadaptive late fusion for image search and person reidentification, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2015, pp. 1741–1750]. The contribution of this paper is two-fold. First, a robust person re-identification method by combining the metric learning with late fusion techniques is proposed. The metric learning method Cross-view Quadratic Discriminant Analysis (XQDA) is employed to learn a discriminant low dimensional subspace to minimize the intra-person distance while maximize the inter-person distance. Moreover, product rule-based and sum rule-based late fusion techniques are applied on these distances. Second, concerning feature engineering, the ResNet extraction process has been modified in order to extract local features of different stripes in person images. To show the effectiveness of the proposed method, both single-shot and multi-shot scenarios are considered. Three state-of-the-art features that are Gaussians of Gaussians (GOG), Local Maximal Occurrence (LOMO) and deep-learned features extracted through a Residual network (ResNet) are extracted from person images. The experimental results on three benchmark datasets that are iLIDS-VID, PRID-2011 and VIPeR show that the proposed method obtains+11.86%,+3.48% and+2.22% of improvement over the best results obtained with the single feature. The proposed method that achieves the accuracy of 85.73%, 93.82% and 50.85% at rank-1 for iLIDS-VID, PRID-2011 and VIPeR, respectively, outperforms different SOTA methods including deep learning ones. Source code is publicly available to facilitate the development of person re-ID system.
The task of referring relationships in images aims to locate the entities (subject and object) described by a relationship triple < subject − relationship − object > in images, which can be viewed as a retrieval problem between structured texts and images. However, existing works extract features of the input text and image separately, leading to capture the correlations between these two modalities insufficiently. Moreover, the attention mechanisms used in cross-modal retrieval tasks do not consider local correlation in images. To address these issues, a cross-modal similarity attention network is proposed in this work, including a cross-modal metric learning module and a cross-modal local attention module. The cross-modal metric learning module adaptively models the similarity between query text and input image, and refines image features to obtain cross-modal features. Regarding the cross-modal local attention module, it concentrates on the query entity in cross-modal features both on image channels and spatial local regions. The experiments demonstrate the superiority of the proposed approach as compared with current powerful frameworks on two challenging benchmark datasets — Visual Genome and VRD.
Deep semi-supervised learning has been widely implemented in the real-world due to the rapid development of deep learning. Recently, attention has shifted to the approaches such as Mean-Teacher to penalize the inconsistency between two perturbed input sets. Although these methods may achieve positive results, they ignore the relationship information between data instances. To solve this problem, we propose a novel method named Metric Learning by Similarity Network (MLSN), which aims to learn a distance metric adaptively on different domains. By co-training with the classification network, similarity network can learn more information about pairwise relationships and performs better on some empirical tasks than state-of-art methods.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.