We present a shape-based method for automatic people detection and counting without any assumption concerning camera motion. In order to evaluate the robustness of the proposed method, we apply it for classifying athletics videos into two classes: videos of individual and videos of team sports. The videos used are real and characterized by dynamic and unconstrained environment. Moreover, in the case of team sport, we propose a shape deformations based method for running/hurdling discrimination (activity recognition). Robust, adaptive and independent from color, illumination changes and the camera motion, the proposed features are combined in the Transferable Belief Model (TBM) framework providing a two-level (frames and shot) video categorization. Experimental results of 97% of accuracy for individual/team sport categorization using a dataset of 252 real videos of athletic meetings, acquired by moving cameras under varying view angles, indicate the stability and the good performance of the proposed scheme.
Third generation surveillance systems are largely requested for intelligent surveillance of different scenarios such as public areas, urban traffic control, smart homes and so on. They are based on multiple cameras and processing modules that integrate data coming from a large surveillance space. The semantic interpretation of data from a multi-view context is a challenging task and requires the development of image processing methodologies that could support applications in extensive and real-time contexts. This paper presents a survey of automatic event detection functionalities that have been developed for third generation surveillance systems with a particular emphasis on open problems that limit the application of computer vision methodologies to commercial multi-camera systems.
People, cars and other moving objects in videos generate time series data that can be labeled in many ways. For example, classifiers can label motion tracks according to the object type, the action being performed, or the trajectory of the motion. These labels can be generated for every frame as long as the object stays in view, so object tracks can be modeled as Markov processes with multiple noisy observation streams. A challenge in video recognition is to recover the true state of the track (i.e. its class, action and trajectory) using Markov models without (a) counter-factually assuming that the streams are independent or (b) creating a fully coupled Hidden Markov Model (FCHMM) with an infeasibly large state space. This paper introduces a new method for labeling sequences of hidden states. The method exploits external consistency constraints among streams without modeling complex joint distributions between them. For example, common sense semantics suggest that trees cannot walk. This is an example of an external constraint between an object label (“tree”) and an action label (“walk”). The key to exploiting external constraints is a new variation of the Viterbi algorithm which we call the Viterbi–Segre (VS) algorithm. VS restricts the solution spaces of factorized HMMs to marginal distributions that are compatible with joint distributions satisfying sets of external constraints. Experiments on synthetic data show that VS does a better job of estimating true states with the given observations than the traditional Viterbi algorithm applied to (a) factorized HMMs, (b) FCHMMs, or (c) partially-coupled HMMs that model pairwise dependencies. We then show that VS outperforms factorized and pairwise HMMs on real video data sets for which FCHMMs cannot feasibly be trained.
Temporal action detection is one of the most important and challenging tasks in video analysis. Due to its wide application prospects, it has received extensive attention in recent years. With the development of deep learning, great progress has been made in temporal behavior detection, but there are still many difficulties to be solved, such as accurate proposal generation and high computational cost. In this paper, deep learning-based temporal action detection methods are classified according to full supervision and weak supervision, and then the representative models of the two methods are summarized in detail, and the ideas, advantages and disadvantages of different models and the evolution between different models are analyzed. At the same time, the performance of different models on mainstream datasets is compared. The mainstream dataset and evaluation index used in temporal action detection are introduced in detail, and the calculation method of evaluation index is also elaborated. Finally, through in-depth analysis, the possible future research directions of temporal action detection and the whole review are summarized.
In this study, a novel hybrid deep clustering approach is proposed for the effective moving object segmentation. Initially, the data is collected, and the keyframe selection is performed using the threshold-based Kennard–Stone method. Then, the preprocessing step involves noise filtering using bilateral wavelet thresholding and binary color conversion. The blob detection is performed using normalized Laplacian of Gaussian. Finally, the segmentation of moving objects is performed using a hybrid clustering approach called improved fuzzy C-mean (IFCM) clustering with chaotic salp swarm (CSS) optimization algorithm (Hybrid IFCM-CSS). The overall evaluation is done in MATLAB. The performance of the hybrid IFCM-CSS is compared to other approaches based on some measures. The proposed Hybrid IFCM-CSS achieves the highest precision of 0.971, using the SBM-RGBD dataset.
Action recognition is a challenging task of modeling both spatial and temporal context. Numerous works focus on architectures modality and successfully make worthy progress on this task. While due to the redundancy in time and the limit of computation resources, several works focus on the efficiency study like frame sampling, some for untrimmed videos, and some for trimmed videos. With the intent of improving the effectiveness of action recognition, we propose a novel Computational Spatiotemporal Selector (CSS) to refine and reinforce the key frames with discriminative information in video. Specifically, CSS includes two modules: Temporal Adaptive Sampling (TAS) module and Spatial Frame Resolution (SFR) module. The former can refine the key frames in the temporal space for capturing the key motion information, while the latter can further zoom out some refined frames in the spatial space for eliminating the discrimination-irrelevant structural information. The proposed CSS is flexible to be embedded into most representative action recognition models. Experiments on two challenging action recognition benchmarks, i.e., ActivityNet1.3 and UCF101, show that the proposed CSS improves the performance over most existing models, not only on trimmed videos but also untrimmed videos.
This paper addresses the problem of automatically partitioning a video into semantic segments using visual low-level features only. Semantic segments may be understood as building content blocks of a video with a clear sequential content structure. Examples are reports in a news program, episodes in a movie, scenes of a situation comedy or topic segments of a documentary. In some video genres like news programs or documentaries, the usage of different media (visual, audio, speech, text) may be beneficial or is even unavoidable for reliably detecting the boundaries between semantic segments. In many other genres, however, the pay-off in using different media for the purpose of high-level segmentation is not high. On the one hand, relating the audio, speech or text to the semantic temporal structure of video content is generally very difficult. This is especially so in "acting" video genres like movies and situation comedies. On the other hand, the information contained in the visual stream of these video genres often seems to provide the major clue about the position of semantic segments boundaries. Partitioning a video into semantic segments can be performed by measuring the coherence of the content along neighboring video shots of a sequence. The segment boundaries are then found at places (e.g., shot boundaries) where the values of content coherence are sufficiently low. On the basis of two state-of-the-art techniques for content coherence modeling, we illustrate in this paper the current possibilities for detecting the boundaries of semantic segments using visual low-level features only.
Block motion estimation using full search is computationally intensive. Previously proposed fast algorithms reduce computation by limiting the number of search locations and search directions in a predefined search region. This is accomplished at the expense of accuracy of motion estimation and a large mean squared error for motion-compensated images, especially for image sequences with large displacement and rotation. In this paper, a novel efficient search algorithm for block motion estimation is presented to produce better performance than some fast search algorithms that have been developed, such as three step search, orthogonal search, 2D-logarithmic search, four step search, and block-based gradient descent search, in large displacement and rotation image cases. The proposed algorithm is based on the notion of locally multi-scale operation, search of global minimum, and two layer search strategy. Experimental results show that the proposed algorithm produces anticipative performance while costing much less computation power than the full search algorithm.
This paper introduces a graphical user interface approach to facilitate an efficient and timely generation of statistic data from input videos. By means of a carefully-designed graphical user interface, users can interactively add in various kinds of markers, known as the statistic inducers, on the screen of an input video to specify the areas of interest corresponding to the locations of relevant events. These inducers are in the form of two-dimensional points, lines, polygons, and grids, and can be put on the video screen with great ease. Using these inducers, we not only can efficiently customize the system for a given statistic generation task; in addition, we can also precisely constrain the time-consuming space-time video analysis process (as well as any additional analysis process like optical flow computation or object recognition) on the user-specified areas. To demonstrate the efficacy of the proposed approach, we developed a prototypic system and experimented it in two different statistic generation cases: dormitory light switching and road traffic. In both cases, we just need a few minutes of UI customization time to set up the inducers; once this is done, timely statistics can be automatically generated subsequently.
In this paper, we present a methodology for recognizing fingerspelling signs in videos. A novel approach of user specific appearance model is proposed for improved recognition performance over the classical appearance based model. Fingerspelt postal index number (PIN) code signs in a video are recognized by identifying the signs in the individual frames of a video. Decomposition of a video into frames results in a large number of frames even for a video of short duration. Each frame is processed to get only the hand frame. This results in a series of hand frames corresponding to each video frame. The obtained series of hand frames consists of frames that are very similar to the neighboring frames and hence are redundant. Therefore, only a few hand frames can be selected and used for recognizing the PIN code signed by a signer. We propose two methods for the selection of images from a video sequence. The methods give a basis for selection of images and have been thoroughly tested to show their benefit in terms of appropriate selection of images and reduction in number of images when compared to arbitrary selection of images.
The study of crowd dynamics is interesting because of the various self-organization phenomena resulting from the interactions of many pedestrians, which may improve or obstruct their flow. Besides formation of lanes of uniform walking direction and oscillations at bottlenecks at moderate densities, it was recently discovered that stop-and-go waves [D. Helbing et al., Phys. Rev. Lett.97 (2006) 168001] and a phenomenon called "crowd turbulence" can occur at high pedestrian densities [D. Helbing et al., Phys. Rev. E75 (2007) 046109]. Although the behavior of pedestrian crowds under extreme conditions is decisive for the safety of crowds during the access to or egress from mass events as well as for situations of emergency evacuation, there is still a lack of empirical studies of extreme crowding. Therefore, this paper discusses how one may study high-density conditions based on suitable video data. This is illustrated at the example of pilgrim flows entering the previous Jamarat Bridge in Mina, 5 kilometers from the Holy Mosque in Makkah, Saudi-Arabia. Our results reveal previously unexpected pattern formation phenomena and show that the average individual speed does not go to zero even at local densities of 10 persons per square meter. Since the maximum density and flow are different from measurements in other countries, this has implications for the capacity assessment and dimensioning of facilities for mass events. When conditions become congested, the flow drops significantly, which can cause stop-and-go waves and a further increase of the density until critical crowd conditions are reached. Then, "crowd turbulence" sets in, which may trigger crowd disasters. For this reason, it is important to operate pedestrian facilities sufficiently below their maximum capacity and to take measures to improve crowd safety, some of which are discussed in the end.
Based on a description of a knowledge continuum and on the knowledge combustion and vehicle analogy, problems regarding the explanation facilities of knowledge-based systems (KBS) are addressed. Furthermore, drawing from the lessons learned from the analysis of the video recordings of medical diagnostic telemedicine consultations, a new approach for developing justification and strategic explanation facilities for KBS is presented. Rather than enabling computers to make profound decisions and generate convincing explanations, which is not technically feasible according to the knowledge continuum, the development of a KBS that provides relevant explanations for specific diagnostic situations is proposed. The knowledge continuum indicates that there is a narrow gap between information, declarative knowledge and some types of pragmatic knowledge. To take advantage of this knowledge characteristic, the creation of a hybrid knowledge type that integrates several more explicit knowledge types through simple processes of extraction, representation, analysis, modelling and editing is proposed.
This work reports an experimental technique specially designed and employed to comprehend and teach the conservation of linear momentum and to overcome instructional difficulties. The apparatus is mainly comprised of specifically designed corrugated inclined planes, two spherical solid balls, a video recorder and a computer with a video tracker analysis programme installed. The momentum measurements are achieved by recording the motion of the rolling bodies and resolving the motion by using the video tracker analyser. Both elastic and inelastic collisions are designed and estimated momentum values clearly justify the conservation of the momentum. The experimental results are also compared with the theoretical calculations, indicating a reasonably good agreement. The designed apparatus can be employed in basic physics laboratories, to achieve enhanced understanding of conservation of the linear momentum.
If one rolls a vertically standing hula hoop forward with backspin applied, it moves once far away and then eventually back to hand. In this paper, we study such a motion of a rolling hula hoop from both theoretical and experimental aspects. The hula hoop is rolling with slipping immediately after it leaves the hand, and its motion will transit to that of without slipping due to the transition of the frictional force from kinetic friction to static friction. We show that the experimental results analyzed by Tracker software can be well described by equations of motion that take into account the deformation of the hula hoop. Theoretical and experimental studies in this paper are suitable for university students in physics courses. The video analyzed in this paper can be viewed at https://youtu.be/i4j1lDhCI2os.
Recently, the use of video analysis technique has emerged as an effective and facile learning tool, due to the richness of spatial and temporal data useful to investigate the complex physical phenomena related to kinematics. In this study, we have investigated the motion of solid and annular cylinders rolling down an inclined wooden plane at different angles. The linear accelerations of the cylinders for the case of rolling (with and without slipping) have been derived theoretically and have been compared with their experimental counterparts. Specifically, the experimental values have been determined by performing a series of experiments, wherein the motion of the cylinders has been captured via a digital camera (recording at 240 frames s−1) and later analyzed frame by frame utilizing in-house developed GUI-based “Phystrack” video tracking library. We have measured the transition angles corresponding to the transition of motion (a) from rest to rolling, and (b) from pure rolling to a combination of rolling and slipping mode of motion, for the case of two distinct cylinders. This has eventually allowed us to compute the coefficient of static, kinetic and rolling friction for the aforementioned cylinders. In general, the coefficient of kinetic friction is regarded as an intrinsic material-dependent constant and considered as independent of the geometry of the object. However, in the case of rolling motion, the coefficients of friction are strongly dependent upon the geometrical parameters of the rolling object. The study emphasizes on developing the conceptual understanding ability of physics students pertaining to the friction coefficient of rolling objects.
We determine the acceleration due to gravity (g) in a novel way using a magnetic sensor and video analysis technique of a smartphone. The same applications are used to measure the terminal velocity of a magnet falling through a conducting pipe and the magnetic moment of the magnet from its torsional oscillations. This experiment would appear to be intriguing, as it combines elements of magnetism, terminal velocity, and electromagnetic damping to determine g.
In this paper we describe a novel approach to identify the state of ball possession by two teams in a soccer video. The proposed approach uses a combination of a few techniques to identify the presence of players and ball. We segment soccer video frames by Markov Random Field (MRF) clustering technique and use a colour-texture feature for classifying the players of the two teams. The players are detected by a combination of static analysis, optical flow analysis and difference image analysis. The classification decisions from the three techniques are combined together using the Dempster Shafer theory of evidence. The ball in the video frames is identified using Hough Transform for circle detection. The proposed approach of combining evidences from various techniques makes the analysis more robust.
Tsunami run-up through the river is one of important features for better predictions and estimations of inundation characteristics. This study carried out numerical investigations of observed tsunami inundation around the Kido river especially focusing on the prediction of run-up speed of tsunami along the river. Based on non-linear long wave model, a sensitivity analysis was first carried out by changing several computational conditions such as tsunami profile, bottom frictions and the river discharge. It was found through this analysis that these conditions, within the range of expected uncertain variations, have certain influence on predicted run-up speed of tsunami. Second, this study also investigated the influence of the different discretization schemes of the model on predictive skills of the speed of tsunami run-up. Difference of conservative and non-conservative forms of non-linear term was investigated through numerical experiments of non-viscosity Burgers equations and it was found that the difference of these forms has significant influence on the predicted propagation speed of the bore. The same analysis was applied in the case of the Kido river and it was found that the tsunami run-up speed was increased up to 40% by selecting appropriate discretization schemes of conservative form.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.