You do not have any saved searches
Person Re-identification (person Re-ID), as an intelligent video surveillance technology capable of retrieving the same person from different cameras, may bring about some challenges arising from the changes in the person’s poses, different camera views as well as occlusion. Recently, person Re-ID equipped with the attention mechanism has gradually emerged as one of the most active areas of study in the fields of computer vision and fractal feature modeling applications. Despite the upsurge in related research, existing fractal-attention-based methods still face two major challenges when recognizing different pedestrians in unpredictable realistic environments: (1) the adaptability of a single local attention feature to hostile scenes cannot be guaranteed, and (2) the existing methods originating from attention features usually rely on the line mapping or simple variants, which make it difficult to excavate the association relationships among pedestrians with similar appearance attributes. To address these issues, this paper proposes a simple effective fractal feature modeling method, named multi-dimensional attention and spatial adaptive relationship learning framework (MASARF) to explore the correlation between pedestrian bodies for person Re-ID. The proposed framework encompasses a multi-dimensional fractal-attention feature learning model (MDAM) and a dual-branch graph convolutional model (DGCM). In particular, the MDAM comprises the local and global attention modules, which are used to capture multi-dimensional attention features for each person. Subsequently, the DGCM is used to construct the nonlinear mapping association relationships among the various body regions for each person via a dual-branch graph convolutional optimization strategy. Extensive experiments were conducted using public person Re-ID datasets (Market-1501, DukeMTMC-reid, and CUHK-03). The results demonstrate that the performance of the proposed approach is superior to that of state-of-the-art methods between 2% and 10% at Rank-1 (mAP). Essential differences exist between our method and the existing methods in terms of feature extraction and relationship transformation, which provides the validation of its novelty in the person Re-ID domain.
In this work, a new method for human re-identification (Re-ID) from multiple surveillance cameras is proposed. Unlike traditional methods in which only the body features are used for matching, our proposed method uses both body and facial features for the Re-ID process. This combination allows us to re-identify people in challenging conditions, such as people with a uniform or similar-looking outfit, a partial or occluded body, appearance changes, or illumination. The face and body feature extraction models were developed using the state-of-the-art deep neural backbones and the synthesis of existing datasets in the literature. The performance of the method was evaluated on a self-generated dataset, which contains images under challenging conditions. The evaluation results show that our method outperforms traditional methods, in which the accuracy Rank1 reaches 91.30% while the traditional ones have a Rank1 of only 86.96%. This newly introduced method can be used for many practical applications in security surveillance of buildings and offices where challenging conditions often appear.
Person re-identification (Re-ID) is a research hot spot in the field of intelligent video analysis, and it is also a challenging task. As the number of samples grows larger, traditional metric and feature learning methods fall into bottleneck, while it just meets the needs of deep learning algorithm, which perform very well in person re-identification. Although they have achieved good results in the field of supervised learning, their application in real-world scenarios is not very satisfactory. This is mainly because in the real world, a huge number of labeled images are hard to obtain, and even if they are obtained, the cost is expensive. Meanwhile, the performance of deep learning in unsupervised metrics is not ideal. For solving the problem, we propose a new method based on unsupervised domain adaptation (UDA) and re-ranking, and name it UDA+. As for this method, we first train a camera-aware style transfer model to gain camstyle images. Then we further reduce the difference between the domain of the target and source by using invariant feature, and further improve their commonality. In addition, re-ranking is also introduced to optimize the matching results. This method can not only reduce the cost of obtaining labeled data, but also improve the accuracy. Experimental results show that our method can outperform the most advanced method by 4% on Rank-1 and 14% on mAP. The results also better confirm the effectiveness of Re-ranking module and provide a new idea for domain adaptation by unsupervised methods in the future.
Person re-identification is one of the indispensable elements for visual surveillance. It assigns consistent labeling for the same person within the field of view of the same camera or even across multiple cameras. While handcrafted feature extraction is certainly one way of approaching this problem, in many cases, these features are becoming more and more complex. Besides, training a deep convolutional neural network (CNN) from scratch is difficult because it requires a large amount of labeled training data and a great deal of expertise to ensure proper convergence. This paper explores the following three main strategies for solving the person re-identification problem: (i) using handcrafted features, (ii) using transfer learning based on a pre-trained deep CNN (trained for object categorization) and (iii) training a deep CNN from scratch. Our experiments consistently demonstrated that: (1) The handcrafted features may still have favorable characteristics and benefits especially in cases where the learning database is not sufficient to train a deep network. (2) A fully trained Siamese CNN outperforms handcrafted approaches and the combination of pre-trained CNN with different re-identification processes. (3) Moreover, our experiments demonstrated that pre-trained features and handcrafted features perform equally well. These experiments have also revealed the most discriminative parts in the human body.
Person re-identification (person re-ID) is a challenging task which aims at spotting same persons among disjoint camera views. It has certainly generated a lot of attention in the field of computer vision, but it remains a challenging task due to the complexity of person appearances from different camera views. To solve this challenging problem, many excellent methods have been proposed, especially metric learning-based algorithms. However, most of them suffer from the problem of data imbalance. To solve this problem, in the paper we proposed a new data-balanced method and named it Enhanced Metric Learning (EML) based on adaptive asymmetric and diversity regularization for person re-ID. Metric learning is important for person re-ID because it can eliminate the negative effects caused by camera differences to a certain extent. But most metric learning approaches often neglect the problem of data imbalance caused by too many negative samples but few positive samples. And they often treat all negative samples the same as positive ones, which can lead to the loss of important information. Our approach pays different attention to the positive samples and negative ones. Firstly, we classified negative samples into three groups adaptively, and then paid different attention to them using adaptive asymmetric strategy. By treating samples differently, the proposed method can better exploit the discriminative information between positive and negative samples. Furthermore, we also proposed to impose a diversity regularizer to avoid over-fitting when the training sets are small or medium-sized. Finally, we designed a series of experiments on four challenging databases (VIPeR, PRID450S, CUHK01 and GRID), to compare with some excellent metric learning methods. Experimental results show that the rank-1 matching rate of the proposed method has outperformed the state-of-the-art by 3.64%, 4.2%, 3.13% and 2.83% on the four databases, respectively.
With the recent technological advances, surveillance cameras became accessible to the general public and a huge amount of nonstructured data is being gathered. However, extracting value from this data is challenging, especially for tasks that involve human images, such as face recognition and person re-identification. Annotation of this kind of data is a challenging and expensive task. In this work, we propose a domain adaptation workflow to allow CNNs that were trained in one domain to be applied to another domain without the need for annotated target data. Our method uses AlignedReID++ as the baseline, trained using a Triplet loss with batch hard. Domain adaptation is done in an unsupervised manner by clustering unlabeled data to generate pseudo-labels in the target domain. Our results show that domain adaptation really improves the performance of the CNN when applied in the target domain.
Visible–infrared person re-identification (VI-ReID) is a current focused area in the field of re-identification. In order to reduce the gap between two modalities in VI-ReID and improve recognition accuracy, this paper proposes a four-stream network and nonsignificant feature learning (FS-NSF) method for VI-ReID. First, the dual-intermediate modality images of visible and infrared modalities are generated by two lightweight networks, and the labels are inherited from the visible and infrared images. Second, the ResNet50 backbone network is split in order to reconstruct the network adapted to shared feature learning of the four modalities. Finally, a multi-branch, multi-scale and multi-granularity feature extraction strategy is used to extract both significant and nonsignificant features. The comparison experiments are conducted on SYSU-MM01 dataset and RegDB dataset. The experimental results show that, compared with state-of-the-arts, our method has excellent performance on both datasets, especially on the SYSU-MM01 dataset, with an increase in performance of 1.9–6.28% for each index.
The number of existing person re-identification datasets is limited, and there are a series of changes such as illumination, background occlusion and pose among each dataset, which makes it difficult for the existing methods to learn robust feature representation, leading to a decline in recognition performance. To solve these problems, a person re-identification method combining style and pose generation is proposed in this paper. First with the impact of camera style differences in collecting images from different cameras, a style transformation method based on generative adversarial network is introduced into a person re-identification model, and cyclic generative adversarial networks (CycleGAN) is used to realize style transfer and reduce the influence of camera differences. Second, in view of the problem that when pedestrian pose changes greatly, easy to ignore identity-sensitive related information, AlphaPose is introduced to implement pose estimation. Combining style and posture for the first time and using improved deep convolution generative adversarial network (DCGAN) structure enrich the input sample information and generate unified style pose image; while using new synthetic data to train person re-identification network model improves the recognition performance of the model. Finally, further introducing random erasure method during data enhancement, in order to reduce the overfitting phenomena, improves the generalization ability of the network simultaneously and solves partial occlusion. The experimental results show that the proposed method outperforms typical style-based or pose-based methods. The accuracy of rank-1 and mAP on Market-1501 dataset is 90.4% and 74.5%, respectively, which are 2.28% and 5.78% higher, respectively. To a certain extent, the performance of person re-identification is improved.
To fully pay attention to identity-sensitive feature information and utilize the correlations of inter-attributes and attributes-body parts, this paper proposes a person re-identification (re-ID) method based on the construction of graph convolutional network (GCN) with crucial attribute feature and body parts. First, it establishes the multiscale context-aware network (MSCAN) using dilated convolution with different expansion ratios, which can learn multiscale context information and obtain diversified global features. Subsequently, the human parsing model is utilized to extract the body part features. According to the attribute importance degree, the paper constructs low-dimensional GCN integrating the vital attributes and body parts of person descriptions to obtain discriminative local features. Finally, based on attribute prediction, it reduces the range of the images to be matched with discriminating possible objects from query images, thereby simplifying retrieval process. The experimental results demonstrate that the novel designed method can effectively improve person re-ID performance and achieve competitive evaluation results on typical public testing datasets.
This paper proposes a few-shot pedestrian re-identification (Re-ID) model based on an improved ResNet50 with a compression and stimulation module, which is named CS-ResNet50. It combines the meta-learning framework with metric learning. This method first compresses residual network channels, then stimulates them to achieve the effect of feature weighting, ultimately making feature extraction more accurate. The research makes the model learn how to finish new tasks efficiently from its experience that it has obtained in the training process of former subtasks. In each subtask, the dataset is divided into a gallery set and a query set, where the model parameters are trained. In this way, the model can be trained efficiently and adopted to new tasks rapidly, which could solve few-shot Re-ID problems. Compared with the baseline, the proposed model improves two indicators efficiently on two Re-ID datasets and achieves better Re-ID effect in few-shot mode.
Person re-identification methods currently encounter challenges in feature learning, primarily due to difficulties in expressing the correlation between local features and integrating global and local features effectively. To address these issues, a pose-guided person re-identification method with Two-Level Channel–Spatial Feature Integration (TLCSFI) is proposed. In TLCSFI, a two-level integration mechanism is implemented. At the first level, TLCSFI integrates the spatial information from local features to generate fine-grained spatial features. At the second level, the fine-grained spatial feature and the coarse-grained channel feature are integrated together to complete channel–spatial feature integration. In the method, a Pose-based Spatial Feature Integration (PSFI) module is introduced to generate the pose union feature, which calculates intra-body affinity to guide the integration of spatial information among local pose feature maps. Then, a Channel and Spatial Union Feature Integration (CSUFI) module is proposed to efficiently integrate the channel information of the global feature and the spatial information of the pose union feature. Two individual networks are designed to extract channel and spatial information, respectively, in CSUFI, which are then weighted and integrated. Experiments are conducted on three publicly available datasets to evaluate TLCSFI, and the experimental results demonstrate its competitive performance.
To address the limitations of visible light cameras that cannot function effectively at night, infrared cameras have become the optimal supplement. However, current methods for visible–infrared cross-modality person re-identification focus solely on feature combination and fusion, neglecting the importance of feature alignment. To address this issue, we introduce a novel Hierarchical Feature Fusion (HFF) network, which comprehensively integrates features across various levels through sequential feature extraction. Specifically, we design a pixel-level contrastive loss function that makes pixels in the same region of cross-modality images more similar and distinguishes pixel features at different locations, thereby extracting similar low-frequency information in the shallow network. Furthermore, in the deep network, we extract high-frequency information of different modalities through the Bi-Transformer Layer and propose Node-level Coupling Attention and Modality-level Decoupling Attention. Coupling attention is used for high-frequency information coupling within the same modality while decoupling attention is used for high-frequency information decoupling between different modalities to obtain more texture and detail information. Through a series of experimental results, we validate the superiority of the proposed HFF network in cross-modality person re-identification. Our proposed method achieved 87.16% and 95.23% Rank-1 on the SYSU-MM01 and RegDB datasets, respectively, and extensive experiments have validated its effectiveness in feature alignment.
Person Re-ID is challenged by background clutter, body misalignment and part missing. In this paper, we propose a reliable part-based multiple levels attention deep network to learn multiple scales salience representation. In particular, person alignment and key point detection are sequentially carried out to locate three relative stable body components, then fused attention (FA) mode is designed to capture the fine-grained salient features from effective spatial of valuable channels of each part, regional attention mode is succeeded to weight the importance of different parts for highlighting the representative parts while suppressing the valueless ones. A late fusion-based multiple-task loss is finally adopted to further optimize the valuable feature representation. Experimental results demonstrate that the proposed method achieves state-of-the-art performances on three challenging benchmarks: Market-1501, DukeMTMC-reID and CUHK03.
Taking videos as nodes in a graph, graph matching is an effective technique for unsupervised video-based person re-identification (re-ID). However, most of existing methods are sensitive to noisy training data and mainly only focus on visual content relations between query and gallery videos, which may introduce large amount of false positives. To enhance the robustness to training data and alleviate the visual ambiguity, a Dynamic Hybrid Graph Matching (DHGM) method is proposed, which jointly considers both content and context information for person re-ID in an iterative manner. The content relations between video nodes are obtained by metric learning, based on which the context relation is acquired by encoding the bidirectional feature of each probe node relative to its graph neighbors. The model is iteratively updated during the process of graph construction for promoted distance measurement and further better matching performance. Experimental results on the PRID 2011 and iLIDS-VID datasets demonstrate the superiority of the DHGM.
In recent years, jointly utilizing local and global features to improve model performance is becoming an important approach for person re-identification. If the relationship between body parts is not considered, it is easy to confuse the identity differentiation of different persons with similar attributes in the corresponding parts. To solve this problem, we propose a feature fusion-based method for person re-identification, which contains three core parts: an adjacency module, a counterfactual attention module and a global difference pooling module. First, an adjacency module is designed to consider the relationship between adjacent body parts and make the features more discriminative. Next, a counterfactual attention module is proposed to conduct counterfactual intervention analysis and encourage the network to learn more useful attention to obtain more fine-grained features. Then, a global difference pooling module is used to learn the global features of a person’s image itself and pay more attention to the important features of the human body. Through the fusion of local features and global features, our model can effectively distinguish the identities of different people with similar attributes in the corresponding parts. Finally, we conduct a large number of experiments and achieve outstanding results on Market-1501, CUHK03 and Msmt17.
Fusion techniques with the aim to leverage the discriminative power of different appearance features for person representation have been widely applied in person re-identification. They are performed by concatenating all feature vectors (known as early fusion) or by combining matching scores of different classifiers (known as late fusion). Previous studies have proved that late fusion techniques achieve better results than early fusion ones. However, majority of the studies focus on determining the suitable weighting schemes that can reflect the role of each feature. The determined weights are then integrated in conventional similarity functions, such as Cosine [L. Zheng, S. Wang, L. Tian, F. He, Z. Liu and Q. Tian, Queryadaptive late fusion for image search and person reidentification, in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2015, pp. 1741–1750]. The contribution of this paper is two-fold. First, a robust person re-identification method by combining the metric learning with late fusion techniques is proposed. The metric learning method Cross-view Quadratic Discriminant Analysis (XQDA) is employed to learn a discriminant low dimensional subspace to minimize the intra-person distance while maximize the inter-person distance. Moreover, product rule-based and sum rule-based late fusion techniques are applied on these distances. Second, concerning feature engineering, the ResNet extraction process has been modified in order to extract local features of different stripes in person images. To show the effectiveness of the proposed method, both single-shot and multi-shot scenarios are considered. Three state-of-the-art features that are Gaussians of Gaussians (GOG), Local Maximal Occurrence (LOMO) and deep-learned features extracted through a Residual network (ResNet) are extracted from person images. The experimental results on three benchmark datasets that are iLIDS-VID, PRID-2011 and VIPeR show that the proposed method obtains+11.86%,+3.48% and+2.22% of improvement over the best results obtained with the single feature. The proposed method that achieves the accuracy of 85.73%, 93.82% and 50.85% at rank-1 for iLIDS-VID, PRID-2011 and VIPeR, respectively, outperforms different SOTA methods including deep learning ones. Source code is publicly available to facilitate the development of person re-ID system.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.