The aim of this paper is to propose an artificial intelligence based approach to moving object detection and tracking. Specifically, we adopt an approach to moving object detection based on self organization through artificial neural networks. Such approach allows to handle scenes containing moving backgrounds and gradual illumination variations, and achieves robust detection for different types of videos taken with stationary cameras. Moreover, for object tracking we propose a suitable conjunction between Kalman filtering, properly instanced for the problem at hand, and a matching model belonging to the class of Multiple Hypothesis Testing. To assess the validity of our approach, we experimented both proposed moving object detection and object tracking over different color video sequences that represent typical situations critical for video surveillance systems.
A novel object tracking algorithm is presented in this paper by using the joint color-texture histogram to represent a target and then applying it to the mean shift framework. Apart from the conventional color histogram features, the texture features of the object are also extracted by using the local binary pattern (LBP) technique to represent the object. The major uniform LBP patterns are exploited to form a mask for joint color-texture feature selection. Compared with the traditional color histogram based algorithms that use the whole target region for tracking, the proposed algorithm extracts effectively the edge and corner features in the target region, which characterize better and represent more robustly the target. The experimental results validate that the proposed method improves greatly the tracking accuracy and efficiency with fewer mean shift iterations than standard mean shift tracking. It can robustly track the target under complex scenes, such as similar target and background appearance, on which the traditional color based schemes may fail to track.
This paper proposes an efficient method for detecting ghost and left objects in surveillance video, which, if not identified, may lead to errors or wasted computational power in background modeling and object tracking in video surveillance systems. This method contains two main steps: the first one is to detect stationary objects, which narrows down the evaluation targets to a very small number of regions in the input image; the second step is to discriminate the candidates between ghost and left objects. For the first step, we introduce a novel stationary object detection method based on continuous object tracking and shape matching. For the second step, we propose a fast and robust inpainting method to differentiate between ghost and left objects by reconstructing the real background using the candidate's corresponding regions in the current input and background image. The effectiveness of our method has been validated by experiments over a variety of video sequences and comparisons with existing state-of-art methods.
In this paper we present an improved color-based planar fiducial marker system. Our framework provides precise and robust full 3D pose estimation of markers with superior accuracy when compared with many fiducial systems in the literature, while color information encoding enables using over 65 000 distinct markers. Unlike most color-based fiducial frameworks, which requires prior classification training and color calibration, ours can perform reliably under illumination changes, requiring but a rough white balance adjustment. Our methodology provides good detection performance even under poor illumination conditions which typically compromise other marker identification techniques, thus avoiding the evaluation of otherwise falsely identified markers. Several experiments are presented and carefully analyzed, in order to validate our system and demonstrate the significant improvement in estimation accuracy of both position and orientation over traditional techniques.
In multi-camera video tracking, the tracking scene and tracking-target appearance can become complex, and current tracking methods use entirely different databases and evaluation criteria. Herein, for the first time to our knowledge, we present a universally applicable template library updating approach for multi-camera human tracking called multi-state self-learning template library updating (RS-TLU), which can be applied in different multi-camera tracking algorithms. In RS-TLU, self-learning divides tracking results into three states, namely steady state, gradually changing state, and suddenly changing state, by using the similarity of objects with historical templates and instantaneous templates because every state requires a different decision strategy. Subsequently, the tracking results for each state are judged and learned with motion and occlusion information. Finally, the correct template is chosen in the robust template library. We investigate the effectiveness of the proposed method using three databases and 42 test videos, and calculate the number of false positives, false matches, and missing tracking targets. Experimental results demonstrate that, in comparison with the state-of-the-art algorithms for 15 complex scenes, our RS-TLU approach effectively improves the number of correct target templates and reduces the number of similar templates and error templates in the template library.
Due to declining trading volume growth in e-commerce platforms, physical channels have attracted considerable investments from various large international companies (e.g. Alibaba, JD, Walmart, Wanda, and Wuzhou International). However, e-commerce platforms can track consumers’ behaviors (attraction to landing page design, clicks on certain products, consumer behavior trajectory tracking, clicks on advertisements, and internal link optimization of product pages), a feat unachievable in current physical channels. Consequently, this study attempted to apply the characteristics of online channels in a physical channel by using image object tracking and image detection techniques. Through this inclusion, physical channels are capable of providing consumers with more favorable experience and interaction, and brick-and-mortar store owners can obtain a more accurate understanding of consumer behaviors of store consumers. Information acquired through this system can be provided to store owners to serve as reference for merchandise placement, arrangement of display shelves, and consumer circulation path planning. This study used the technique of image processing to locate the Region of Interest and applied object tracking to get the consumer’s trajectory which successfully implemented the consumer-tracking characteristics of online platforms in a physical channel while retaining the unique experience of the physical channel. This results in a win–win scenario for businesses and consumers.
Aiming to tackle the problem of tracking drift easily caused by complex factors during the tracking process, this paper proposes an improved object tracking method under the framework of kernel correlation filter. To achieve discriminative information that is not sensitive to object appearance change, it combines dimensionality-reduced Histogram of Oriented Gradients features and Lab color features, which can be used to exploit the complementary characteristics robustly. Based on the idea of multi-resolution pyramid theory, a multi-scale model of the object is constructed, and the optimal scale for tracking the object is found according to the confidence maps’ response peaks of different sizes. For the case that tracking failure can easily occur when there exists inappropriate updating in the model, it detects occlusion based on whether the occlusion rate of the response peak corresponding to the best object state is less than a set threshold. At the same time, Kalman filter is used to record the motion feature information of the object before occlusion, and predict the state of the object disturbed by occlusion, which can achieve robust tracking of the object affected by occlusion influence. Experimental results show the effectiveness of the proposed method in handling various internal and external interferences under challenging environments.
Aiming to solve the problem of tracking drift during movement, which was caused by the lack of discriminability of the feature information and the failure of a fixed template to adapt to the change of object appearance, the paper proposes an object tracking algorithm combining attention mechanism and correlation filter theory based on the framework of full convolutional Siamese neural networks. Firstly, the apparent information is processed by using the attention mechanism thought, where the object and search area features are optimized according to the spatial attention and channel attention module. At the same time, the cross-attention module is introduced to process the template branch and search area branch, respectively, which makes full use of the diversified context information of the search area. Then, the background perception correlation filter model with scale adaptation and learning rate adjustment is adopted into the model construction, using as a layer in the network model to realize the object template update. Finally, the optimal object location is determined according to the confidence map with similarity calculation. Experimental results show that the designed method in the paper can promote the object tracking performance under various challenging environments effectively; the success rate increases by 16.2%, and the accuracy rate increases by 16%.
In this paper, we propose a multi-modal sparse tracking by jointing timing and modal consistency to locate the target location with the similarity of multiple local appearances. First, we propose an alignable patching strategy for red-green-blue (RGB) color mode and thermal infrared mode to adapt to the local changes of the target. Second, we propose a consistency expression of the corresponding aligned patches between the modes and the correlation of the gaussian mapping within mode to reconstruct the target judgment likelihood function. Finally, we propose an updating scenario based on timing correlation and mode sparsity to fit with the target changes. According to the experimental results, significant improvement in terms of tracking accuracy can be achieved on average compared with the state-of-the-art algorithms. The source code of our algorithm is available on https://github.com/Liincq/tracker.
Counting small pixel-sized vehicles and crowds in unmanned aerial vehicles (UAV) images is crucial across diverse fields, including geographic information collection, traffic monitoring, item delivery, communication network relay stations, as well as target segmentation, detection, and tracking. This task poses significant challenges due to factors such as varying view angles, non-fixed drone cameras, small object sizes, changing illumination, object occlusion, and image jitter. In this paper, we introduce a novel multi-data-augmentation and multi-deep-learning framework designed for counting small vehicles and crowds in UAV images. The framework harnesses the strengths of specific deep-learning detection models, coupled with the convolutional block attention module and data augmentation techniques. Additionally, we present a new method for detecting cars, motorcycles, and persons with small pixel sizes. Our proposed method undergoes evaluation on the test dataset v2 of the 2022 AI Cup competition, where we secured the first place on the private leaderboard by achieving the highest harmonic mean. Subsequent experimental results demonstrate that our framework outperforms the existing YOLOv7-E6E model. We also conducted comparative experiments using the publicly available VisDrone datasets, and the results show that our model outperforms the other models with the highest AP50 score of 52%.
The paper describes a method for detecting 2D straight segments and their correspondences in successive frames of an image sequence by means of a Hough-based matching approach. The main advantage of this method is the possibility of extracting and matching 2D straight segments directly in the feature space, without the need for complex matching operations and time-consuming inverse transformations. An additional advantage is that only four attributes of 2D straight segments are required to perform an efficient matching process: position, orientation, length, and midpoint. Tests were performed on both synthetic and real images containing complex man-made objects moving in a scene. A comparison with a well-known 2D line matching algorithm is also made.
Active or deformable models have emerged as a popular modeling paradigm in computer vision. These models have the flexibility to adapt themselves to the image data, offering the potential for both generic object recognition and non-rigid object tracking. Because these active models are underconstrained, however, deformable shape recovery often requires manual segmentation or good model initialization, while active contour trackers have been able to track only an object's translation in the image. In this paper, we report our current progress in using a part-based aspect graph representation of an object14 to provide the missing constraints on data-driven deformable model recovery and tracking processes.
A line segment based approach for 3D motion estimation and tracking of multiple objects from a monocular image sequence is presented. Objects are described by means of 3D line segments, and their presence in the scene is associated with the detection of 2D line segments on the image plane. A change detection algorithm is applied to detect moving objects on the image plane and a Hough-based algorithm is used to individuate 2D line segments. 3D parameters of each line segment are estimated, at each time instant, by means of an extended Kalman filter (EKF), whose observations are the displacements of 2D line segment endpoints on the image plane. Results on both synthetic and real scenes are presented.
Object tracking in real-time is one of the applications of video processing, where the required computational cost is high due to intensive high data processing. In order to solve these problems, this paper presents an embedded solution, where the Hardware/Software (HW/SW) co-design architecture is used for the implementation of well-known kernel-based tracking system. In this algorithm, the target is searched in consecutive frame by maximizing the statistical match with similarity estimation of color distribution. The whole tracking system is implemented on low cost Field Programmable Gate Array (FPGA) device with image resolution of 1280×720 pixels and target window size of 160×80 pixels. The HW/SW co-design architecture is proposed to accelerate the computational speed of the system. The performance of the system is evaluated in terms of execution speed and frame rate compared with software based implementation. The hardware cost of design is also compared with other existing methods. The proposed design achieves 22 times computational speed and maximum 60 Frames Per Second (FPS) compared with software based design.
Object tracking is a hot topic in computer vision. The significantly developed satellite video technology makes object tracking in satellite videos possible. In recent years, Convolutional Neural Network (CNN)-based trackers have achieved satisfying performance in the visual object tracking field. However, CNN cannot be directly applied to object tracking in satellite videos due to the following two reasons. First, the feature map size generally decreases as the network layer deepens, which is unsuitable for the small targets in satellite videos. Second, CNN-based trackers commonly need extensive data to train the network parameters, while few labeled satellite videos are available. Therefore, in this paper, we design a lightweight network for the satellite video tracking task. On one hand, the network generates a response map with the same size as the input image and reserves the spatial resolution of targets. On the other hand, the network parameters are transferred from an existing network and trained with the initial annotated frame, thus no extra data are needed. To make a fair comparison between existing trackers, we further propose a simulated benchmark based on the UAV123 dataset according to the imaging characteristics of satellite videos. Experiments are conducted to compare our method with other state-of-the-art trackers on both the simulated benchmark and real satellite videos and experimental results demonstrate the superiority of our proposed algorithm.
This paper focuses on integrating information from RGB and thermal infrared modalities to perform RGB-T object tracking in the correlation filter framework. Our baseline tracker is Staple (Sum of Template and Pixel-wise LEarners), which combines complementary cues in the correlation filter framework with high efficiency. Given the input RGB and thermal videos, we utilize the baseline tracker due to its high performance in both of accuracy and speed. Different from previous correlation filter-based methods, we perform the fusion tracking at both the pixel-fusion and decision-fusion levels. Our tracker is robust to the dataset challenges, and due to the efficiency of FFT, our tracker can maintain high efficiency with superior performance. Extensive experiments on the RGBT234 dataset have demonstrated the effectiveness of our work.
Object tracking for motion scenes is a common research concern in field of computer vision. Its goal is to accurately track targets in different time periods and predict their future states by utilizing the motion information in video sequences. However, traditional target-tracking methods in motion scenes often face challenges such as target blur, occlusion, and changes in lighting. To deal with this issue, this paper proposes a diffusion neural network-enhanced object-tracking approach under sports scenarios. In order to further improve tracking performance, the diffusion convolution operation is introduced, which propagates features at different time steps to enhance the modeling ability of target motion. Then, suitable influencing factors are selected based on motion scene object feature parameters. Finally, a target tracking method is established by integrating these two methods. In the experiment, we used a large number of real motion scene datasets to evaluate the proposed method. The experimental results show that by comparing with traditional moving object tracking methods, the proposal achieves significant improvement in tracking accuracy and methodology robustness. In addition, we also conducted stability experiments, proving that this method has good stability for models with varying kernel numbers.
In the field of intelligent transportation, background complexity, lighting changes, occlusion, and scale transformation affect the tracking results of moving vehicles in the video. We propose an improved vehicle object tracking algorithm based on Multi-Domain Convolutional Neural Networks (MDNet), combining the instance segmentation method with the MDNet algorithm, adding two attention mechanisms to the algorithm. The module extracts better features, ensures that the vehicle object adapts to changes in appearance, and greatly improves tracking performance. Our improved algorithm has a tracking precision rate of 91.8% and a success rate of 67.8%. The Vehicle Tracking algorithm is evaluated on the Object Tracking Benchmark (OTB) data set. The tracking results are compared with eight mainstream object tracking algorithms, and the results show that our improved algorithm has excellent performance. The object tracking precision rate and tracking success rate of this algorithm have achieved excellent results in many cases.
One of the fascinating aspects of sports rivalry is that anything can happen. The significant difficulty is that computer-aided systems must address how to record and analyze many game events, and fractal AI plays an essential role in dealing with complex structures, allowing effective solutions. In table tennis, we primarily concentrate on two issues: ball tracking and trajectory prediction. Based on these two components, we can get ball parameters such as velocity and spin, perform data analysis, and even create a ping-pong robot application based on fractals. However, most existing systems rely on a traditional method based on physical analysis and a non-machine learning tracking algorithm, which can be complex and inflexible. As mentioned earlier, to overcome the problem, we proposed an automatic table tennis-aided system based on fractal AI that allows solving complex issues and high structural complexity of object tracking and trajectory prediction. For object tracking, our proposed algorithm is based on structured output Convolutional Neural Network (CNN) based on deep learning approaches and a trajectory prediction model based on Long Short-Term Memory (LSTM) and Mixture Density Networks (MDN). These models are intuitive and straightforward and can be optimized by training iteratively on a large amount of data. Moreover, we construct a table tennis auxiliary system based on these models currently in practice.
Content description becomes important in the ubiquity of video content on the Web and consumer devices. Video classification is needed so that more appropriate description and search methods can be applied. This paper describes two methods for video content classification: a Nearest Neighbor (NN) method relying on domain knowledge and Hidden Markov Model (HMM) based method. Our approach stems from the observation that in different TV categories, there are different objects (e.g., face and text) trajectory patterns. Face and text tracking is applied to video segments to extract face and text trajectories. We used NN and HMM to classify a given video segment into predefined classes, e.g., commercial, news, situation comedy and soap. Our preliminary experimental results show classification accuracy of 75% for NN and over 80% for HMM based method on short video segments.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.