Please login to be able to save your searches and receive alerts for new content matching your search criteria.
This paper presents a framework to track multiple persons in real-time. First, a method with real-time and adaptable capability is proposed to extract face-like regions based on skin, motion and silhouette features. Then, an adaptable skin model is used for each detected face to overcome the changes of the observed environment. After that, a two-stage face verification algorithm is proposed to quickly eliminate false faces based on face geometries and the SVM (Support Vector Machine) approach. In order to overcome the effect of lighting changes, during verification, a method of color constancy compensation is proposed. Then, a robust tracking scheme is applied to identify multiple persons based on a face-status table. With the table, the proposed system has powerful capabilities to track different persons at different statuses, which is quite important in face-related applications. Experimental results show that the proposed method is more robust and powerful than other traditional methods, which utilize only color, motion information, and the correlation technique.
This paper presents a novel approach to track multiple moving objects using the level-set method. The proposed method can track different objects no matter if they are rigid, nonrigid, merged, split, with shadows, or without shadows. At the first stage, the paper proposes an edge-based camera compensation technique for dealing with the problem of object tracking when the background is not static. Then, after camera compensation, different moving pixels can be easily extracted through a subtraction technique. Thus, a speed function with three ingredients, i.e. pixel motions, object variances and background variances, can be accordingly defined for guiding the process of object boundary detection. According to the defined speed function, different object boundaries can be efficiently detected and tracked by a curve evolution technique, i.e. the level-set-based method. Once desired objects have been extracted, in order to further understand the video content, this paper takes advantage of a relation table to identify and observe different behaviors of tracked objects. However, the above analysis sometimes fails due to the existence of shadows. To avoid this problem, this paper adopts a technique of Gaussian shadow modeling to remove all unwanted shadows. Experimental results show that the proposed method is much more robust and powerful than other traditional methods.
In this paper, we present a method of tracking and identifying persons in video images taken by a fixed camera situated at an entrance. In video sequences a person may be totally or partially occluded in a scene for some period of time. The proposed approach uses the appearance model for the identification of persons and the weighted temporal texture features. The weight is related to the size, duration as well as the number of persons adjacent to the target person. Most systems have built an appearance model for each person to solve occlusion problems. The appearance model contains certain information on the target person. We have compared the proposed method with other related methods using color and shape features, and analyzed the features' stability. Experimental results with various real video data sequences revealed that real time person tracking and recognition is possible with increased stability in video surveillance applications even under situations of occasional occlusion.
Visual surveillance in dynamic scenes is currently one of the most active research topics in computer vision, many existing applications are available. However, difficulties in realizing effective video surveillance systems that are robust to the many different conditions that arise in real environments, make the actual deployment of such systems very challenging.
In this article, we present a real, unique and pioneer video surveillance system for boat traffic monitoring, ARGOS. The system runs continuously 24 hours a day, 7 days a week, day and night in the city of Venice (Italy) since 2007 and it is able to build a reliable background model of the water channel and to track the boats navigating the channel with good accuracy in real-time. A significant experimental evaluation, reported in this article, has been performed in order to assess the real performance of the system.
This paper proposes a new solution to geo-register the nearly feature-less maritime video feeds. We detect the horizon using sizable or uniformly moving vessels, and estimate the vertical apex using water reflections of the street lamps. The computed horizon and apex provide a metric rectification that removes the affine distortions and reduces the searching space for geo-registration. Geo-registration is obtained by searching the best orientation where the estimated water masks on satellite images and camera views are matched. The proposed solution has the following contributions: first, water and coastlines are used as features for registration between horizontally looking maritime views and satellite images. Second, water reflections are proposed to estimate the vertical vanishing point. Third, we give algorithms for the detection of water areas in both satellite images and camera views. Experimental results and applications on cross camera tracking are demonstrated. We also discuss several observations, as well as limitations of the proposed approach.
This paper proposes the Multilevel Gaussian Mixture Model (ML_GMM), a novel algorithm for online, unsupervised modeling of complex motion trajectories which allows for fast and reliable abnormal trajectory detection in video surveillance. The presented approach does not directly deal with target tracking or attribution of an observation to either one or another trajectories belonging to different targets. Instead, it receives as an input a sequence of observations and considers them as produced by a reliable, yet possibly noisy, tracker. As an output, it then produces a labeling of the input sequence, as a "normal" or "abnormal" whole trajectory. The normal and abnormal trajectory domains are learned in an unsupervised fashion during the training phase and can be updated while the classifier is running, thanks to its reduced computational complexity. The novelty of the contribution is twofold. First, ML_GMM is capable of representing short-term, as well as long-term, transitions in a consistent way, thus allowing for accurate representation of long and complex trajectories and their reliable labeling as normal or abnormal. Second, it allows the learning process to be run in parallel with the discrimination process, in order to support adaptability of the model to evolving operating conditions. Experimental results show that ML_GMM outperforms a number of comparable approaches in the literature.
In this paper, we propose a new approach for anomaly detection in video surveillance. This approach is based on a nonparametric Bayesian regression model built upon Gaussian process priors. It establishes a set of basic vectors describing motion patterns from low-level features via online clustering, and then constructs a Gaussian process regression model to approximate the distribution of motion patterns in kernel space. We analyze different anomaly measure criterions derived from Gaussian process regression model and compare their performances. To reduce false detections caused by crowd occlusion, we utilize supplement information from previous frames to assist in anomaly detection for current frame. In addition, we address the problem of hyperparameter tuning and discuss the method of efficient calculation to reduce computation overhead. The approach is verified on published anomaly detection datasets and compared with other existing methods. The experiment results demonstrate that it can detect various anomalies efficiently and accurately.
This paper extends the problem of abandoned/removed object classification in video surveillance to encompass the closely related (but to date ignored) problem of moved object classification. Existing abandoned/removed region classification techniques are shown to arbitrarily classify such localized object movements as either abandonment or removal. Our new approach correctly classifies abandoned and removed objects, and also correctly classified 90% of moved objects.
Multiple object tracking is a fundamental step for many computer vision applications. However, detecting and tracking objects in complex background is still a challenging task. This paper proposes an approach, which combines an improved Gaussian mixture modeling (GMM) with multiple particle filters (MPFs) for automatic multiple targets detecting and tracking. For GMM, we make improvement on GMM in the phase of model updating by using the expectation maximization algorithm and M recent frames with weight parameters of Gaussian distributions. In the tracking stage, we integrate multiple features of targets, including color, edge and depth, into MPFs to improve the performance of object tracking. By comparing with various particle filter approaches, the experimental results show that our approach can track multiple targets in complex backgrounds automatically and accurately.
In recent years, video surveillance has become an integral part of computer vision research, addressing a variety of challenges in security, memory management and content extraction from video sequences. This paper introduces the Robust Object Detection using Fire Hawks Optimizer with Deep Learning (ROD-FHODL) technique, a novel approach designed specifically for video surveillance applications. Combining object detection and classification the proposed technique employs a two-step procedure. Utilizing the power of the Mask Region-based Convolutional Neural Network (Mask-RCNN) for object detection, we optimize its hyperparameters using the Fire Hawks Optimizer (FHO) algorithm to improve its efficacy. Our experimental results on the UCSD dataset demonstrate the significant impact of the proposed work. It achieves an extraordinary RUNNT of 1.34s on the pedestrian-1 dataset, significantly outperforming existing models. In addition, the proposed system surpasses in accuracy, with a pedestrian-1 accuracy rate of 97.45% and Area Under the Curve (AUC) values of 98.92%. Comparative analysis demonstrates the superiority of the proposed system in True Positive Rate (TPR) versus False Positive Rate (FPR) across thresholds. In conclusion, the proposed system represents a significant advancement in video surveillance, offering advances in speed, precision and robustness that hold promise for enhancing security, traffic management and public space monitoring in smart city infrastructure and other applications.
Video surveillance involves petabytes of data storage requiring expensive hardware, which might also be time-inefficient. The aim of this article is, therefore, to develop an intelligent system capable of analyzing long sequences of videos captured from CCTV, helping to mitigate catastrophe and mitigate the violent threats faced by citizens every day, economically and efficiently. Existing models have achieved high accuracy on available datasets, the primary focus is to improve speed (time-efficient) of violence detection and use very little storage (economical) such that the system can be used in real-time. The paper presents an end-to-end hybrid solution for detecting violence in real-time video frames incorporating both human and weapon detection algorithms applied in a synchronized way. The focus of this article is to propose a generic HWVd (Human Weapon Violence detection) model to detect all kinds of public violence. HWVd is a three-tier ensemble model to detect violence in videos. The first tier is human detection, which uses a LightTrack framework. In the second tier, a Fast Region-based Convolutional Neural Network (F-RCNN) to detect any weapon in videos is used. The third tier uses a pre-trained VGG 19 (a pre-trained model of CNN) for spatial feature extraction and Long Short Term Memory (LSTM) to detect violent activity. Lastly, the output of this framework is sent to the Support Vector Machine to classify the activity as (i) violence not involving weapon, (ii) violence involving weapon and (iii) non-violent. The accuracy obtained using the proposed model is 98%.
Video Surveillance has become an essential tool in the Security industry because of the sophisticated and fool-proof technology. Recent developments in image recognition and captioning have enabled us to adopt these technologies in the field of video surveillance. The biggest problem in image captioning is that it is variant on the rotation angle of the image. Different angles of the same image generate different captions. We aim to address and eliminate the rotation variance of image captioning. We have implemented a custom image rotation network using a Convolutional Neural Network (CNN). The input image is rotated to the original angle using this network and passed on to the image captioning. The caption of the image is generated and sent to the user for the situation analysis.
In this paper, text detection in moving business cards for helping visually impaired persons using a wearable camera is presented. The current methods to help visually impaired persons to read the natural scene text, menus and book covers have three problems. First, they assumed that the blind persons are standing still and the captured scene is not moving. Second, the blind persons do not know that the menu or the book cover is captured by the camera. Third, these methods cannot "see" business cards to help the blind persons. The proposed method includes moving detection, thumb detection motion blur detection and text detection methods. Experimental results show that the proposed method can reduce the time complexity. The reduced rates for training and testing sets are 83.18% and 77.08%, respectively. The text detection rates for training and testing sets are 93.44% and 94.58%, respectively. The fps is 53 for 320 × 240 video frames. The program size is 102KB and can be run on mobile devices.
Security has been raised at major public buildings in the most famous and crowded cities all over the world following the terrorist attacks of the last years, the latest one at the Promenade des Anglais in Nice. For that reason, video surveillance systems have become more and more essential for detecting and hopefully even prevent dangerous events in public areas. In this work, we present an overview of the evolution of high-level surveillance event detection systems along with a prototype for anomaly detection in video surveillance context. The whole process is described, starting from the video frames captured by sensors/cameras till at the end some well-known reasoning algorithms for finding potentially dangerous activities are applied.
Security has been raised at major public buildings in the most famous and crowded cities all over the world following the terrorist attacks of the last years, the latest one at the Promenade des Anglais in Nice. For that reason, video surveillance systems have become more and more essential for detecting and hopefully even prevent dangerous events in public areas. In this work, we present an overview of the evolution of high-level surveillance event detection systems along with a prototype for anomaly detection in video surveillance context. The whole process is described, starting from the video frames captured by sensors/cameras till at the end some well-known reasoning algorithms for finding potentially dangerous activities are applied.