Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Crowd behavior prediction (CBP) and crowd counting (CC) are the essential functions of vision-based crowd analysis (CA), which play a crucial role in controlling crowd disasters. The CA using different models for the CBP and the CC will increase computational overheads and have synchronization issues. The state-of-the-art approaches utilized deep convolutional architectures to exploit spatial-temporal features to accomplish the objective, but such models suffer from computational complexities during convolution operations. Thus, to sort out the issues as mentioned earlier, this paper develops a single deep model which performs two functionalities of CA: CBP and CC. The proposed model uses multilayers of depth-wise separable CNN (DSCNN) to extract fine-grained spatial-temporal features from the scene. The DSCNN can minimize the number of matrix multiplications during convolution operation compared to traditional CNN. Further, the existing datasets are available to accomplish the single functionality of CA. In contrast, the proposed model needs a dual-tasking CA dataset which should provide the ground-truth labels for CBP and CC. Thus, a dual functionality CA dataset is prepared using a benchmark crowd behavior dataset, i.e. MED. Around 41000 frames have been manually annotated to obtain ground-truth crowd count values. This paper also demonstrates an experiment on the proposed multi-functional dataset and outperforms the state-of-the-art methods regarding several performance metrics. In addition, the proposed model processes each test frame at 3.40 milliseconds, and thus is easily applicable in real-time.
Camera-based visual simultaneous localization and mapping (VSLAM) algorithms involve extracting and tracking feature points in their front-ends. Feature points are subsequently forwarded to the back-end for camera pose estimation. However, the matching results of these feature points by optical flow are prone to visual feature mismatches. To address the mentioned problems, this paper introduces a novel visual feature mismatch detection algorithm. First, the algorithm calculates pixel displacements for all feature point pairs tracked by the optical flow method between consecutive images. Subsequently, mismatches are detected based on the pixel displacement threshold calculated by the statistical characteristics of tracking results. Additionally, bound values for the threshold are set to enhance the accuracy of the filtered matches, ensuring its adaptability to different environments. Following the filtered matches, the algorithm calculates the fundamental matrix, which is then used to further refine the filtered matches sent to the back-end for camera pose estimation. The algorithm is seamlessly integrated into the state-of-the-art VSLAM system, enhancing the overall robustness of VSLAM. Extensive experiments conducted on both public datasets and our unmanned surface vehicles (USVs) validate the performance of the proposed algorithm.
Under low Reynolds number conditions, the airfoil is prone to a laminar separation bubble (LSB), which causes problems such as lift drop and drag increase. In this work, the fluorescent oil film was applied to study the leading edge LSB from the perspective of skin friction. The experimental study is carried out on a flat plate with the leading edge LSB under the Reynolds number of 9.5×105 (the characteristic length of the plate chord). The camera captures the leading edge LSB that is formed by airflow separation and reattachment, and the flow in the bubble is opposite to the free flow. The position of separation and reattachment is determined by the distribution of skin friction, which is consistent with the visual image of the oil film. Applying the Poincare–Bendixson (P-B) index formula to test the topology of the separation bubble skin friction, the quantitative relations between the critical points and the boundary switching points in the closed region of several permeable boundary all confirm this topology law. The results show that the fluorescent oil film can quantitatively map the flow structure of the LSB, and further verify the advantage of this method in the visual measurement of the complex flow structure.
This paper proposes a hybrid approach to image registration for inferring the affine transformation that best matches a pair of partially overlapping aerial images. The image registration is formulated as a two-stage hybrid approach combining both phase correlation method (PCME) and optical flow equation (OFE) based estimation algorithm in a coarse-to-fine manner. With PCME applied at the highest level of decomposition, the initial affine parameter model could be first estimated. Subsequently, the OFE-based estimation algorithm is incorporated into the proposed hybrid approach using a multi-resolution mechanism. PCME is characterized by its insensitivity to large geometric transform between images, which can effectively guide the OFE-based registration. For image pairs under salient brightness variations, we propose a nonlinear image representation that emphasizes common intensity information, suppresses the non-common information between an image pair, and is suitable for the proposed coarse-to-fine hierarchical iterative processing. Experimental results demonstrate the accuracy and efficiency of our proposed approach using different types of aerial images.
In this paper, we present an algorithm for the hierarchical recognition of an environment using independent components of optical flow fields for the visual navigation of a mobile robot. For the computation of optical flow, the pyramid transform of an image sequence is used for the analysis of global and local motion. Our algorithm detects the planar region and obstacles in the image from optical flow fields at each layer in the pyramid. Therefore, our algorithm allows us to achieve both global perception and local perception for robot vision. We show experimental results for both test image sequences and real image sequences captured by a mobile robot. Furthermore, we show some aspects of this work from the viewpoint of information theory.
Every year, agriculture experience significant economic loss due to wild geese, rooks and other flocks of birds. A wide range of devices to detect and deter animals causing conflict is used to prevent this, although their effectiveness is often highly variable, due to habituation to disruptive or disturbing stimuli. Automated recognition of behaviors could form a critical component of a system capable of altering the disruptive stimulus to avoid habituation. This paper presents an audio-visual-based approach for recognition of goose flocking behavior. The vocal communication and movement of the flock is used for the audio-visual recognition, which is accomplished through classifier fusion of an acoustic and a video-based classifier. Acoustic behavior recognition is based on generalized perceptual features and support vector machines, and visual behavior recognition is based on optical flow estimation and a Bayesian Rule-Based scheme. Classifier fusion is implemented using the product rule on the soft-outputs from both classifiers. The algorithm has been used to recognize goose flocking behaviors (landing, foraging and flushing) and have improved the performance compared to using audio- or video-based classifiers alone. The improvement of using classifier fusion is most evident in the case of flushing and landing behavior recognition, where it was possible to combine the advantages of both the audio- and video-based classifier.
Investigating realistic visual exploration is quite challenging in sport climbing, but it promises a deeper understanding of how performers adjust their perception-action couplings during task completion. However, the samples of participants and the number of trials analyzed in such experiments are often reduced to a minimum because of the time-consuming treatments of the eye-tracking data. Notably, mapping successive points of gaze from local views to the global scene is generally performed manually by watching eye-tracking video data frame by frame. This manual procedure is not suitable for processing a large number of datasets. Consequently, this study developed an automatic method for solving this global point of gaze localization in indoor sport climbing. Particularly, an eye-tracking device was used for acquiring local image frames and points of gaze from a climber’s local views. Artificial landmarks, designed as four-color-disk groups, were distributed on the wall to facilitate localization. Global points of gaze were computed based on planar homography transforms between the local and global positions of the detected landmarks. Thirty climbing trials were recorded and processed by the proposed methods. The success rates (Mean±SD) were up to 85.72%±13.90%, and the errors (Mean±SD) were up to 0.1302±0.2051m. The proposed method will be employed for computing global points of gaze in our current climbing dataset for understanding the dynamics intertwining of gaze and motor behaviors during the climbs.
Optical flow computation in dynamic image processing can be formulated as a minimization problem by a variational approach. Because solving the problem is computationally intensive, we reformulate the problem suitable for neural computing. In this paper, we propose a recurrent neural network model which may be implemented in hardware with many processing elements (neurons) operating asynchronously in parallel to achieve a possible real-time solution. We derive and prove the properties of the reformulation, as well as analyze the asymptotic stability and convergence rate of the proposed neural network. Experiments using both the test patterns and the real laboratory images are conducted.
Analysis of flows in crowd videos is a remarkable topic with practical implementations in many different areas. In this paper, we present a wide overview of this topic along with our own approach to this problem. Our approach treats the difficulty of crowd flow analysis by distinguishing single versus multiple flows in a scene. Spatiotemporal features of two consecutive frames are extracted by optical flows to create a three-dimensional tensor, which retains appearance and velocity information. Tensor’s upper left minor matrix captures intensity structure. A normalized continuous rank-increase measure for each frame is calculated by a generalized interlacing property of the eigenvalues of these matrices. In essence, measure values put through the knowledge of existing flows. Yet they do not go into effect desirably due to optical flow estimation error and some other factors. A proper set of the degree of polynomial fitting functions decodes their existence. But how can we estimate that set? Its detailed study is performed. Zero flow, single flow, multiple flows, and interesting events are detected as frame basis using thresholds on the polynomial fitting measure values. Plausible mean outputs of recall rate (88.9%), precision rate (86.7%), area under the receiver operating characteristic curve (98.9%), and accuracy (92.9%) reported from conducted experiments on PETS2009 and UMN benchmark datasets make clear and visible that our method gains high-quality results to detect flows and events in crowd videos in terms of both robustness and potency.
Micro-expression is a kind of brief facial movements which could not be controlled by the nervous system. Micro-expression indicates that a person is hiding his true emotion consciously. Micro-expression recognition has various potential applications in public security and clinical medicine. Researches are focused on the automatic micro-expression recognition, because it is hard to recognize the micro-expression by people themselves. This research proposed a novel algorithm for automatic micro-expression recognition which combined a deep multi-task convolutional network for detecting the facial landmarks and a fused deep convolutional network for estimating the optical flow features of the micro-expression. First, the deep multi-task convolutional network is employed to detect facial landmarks with the manifold-related tasks for dividing the facial region. Furthermore, a fused convolutional network is applied for extracting the optical flow features from the facial regions which contain the muscle changes when the micro-expression appears. Because each video clip has many frames, the original optical flow features of the whole video clip will have high number of dimensions and redundant information. This research revises the optical flow features for reducing the redundant dimensions. Finally, a revised optical flow feature is applied for refining the information of the features and a support vector machine classifier is adopted for recognizing the micro-expression. The main contribution of work is combining the deep multi-task learning neural network and the fusion optical flow network for micro-expression recognition and revising the optical flow features for reducing the redundant dimensions. The results of experiments on two spontaneous micro-expression databases prove that our method achieved competitive performance in micro-expression recognition.
Visual illusion is the fallacious perception of reality or some actually existing object. In this paper, we imitate the mechanism of Ehrenstein illusion, neon color spreading illusion, watercolor illusion, Kanizsa illusion, shifted edges illusion, and hybrid image illusion using the Open Source Computer Vision Library (OpenCV). We also imitate these illusions using Cellular Neural Networks (CNNs). These imitations suggest that some illusions are processed by high-level brain functions. We next apply the morphological gradient operation to anomalous motion illusions. The processed images are classified into two kinds of images, which correspond to the central drift illusion and the peripheral drift illusion, respectively. It demonstrates that the contrast of the colors plays an important role in the anomalous motion illusion. We also imitate the anomalous motion illusions using both OpenCV and CNN. These imitations suggest that some visual illusions may be processed by the illusory movement of animations.
This paper presents a new method to analyze and synthesize facial expressions, in which a spatio-temporal gradient based method (i.e., optical flow) is exploited to estimate the movement of facial feature points. We proposed a method (called motion correlation) to improve the conventional block correlation method for obtaining motion vectors. The tracking of facial expressions under an active camera is addressed. With the motion vectors estimated, a facial expression can be cloned by adjusting the existing 3-D facial model, or synthesized by using different facial models. The experimental results demonstrate that the approach proposed is feasible for applications such as low bit rate video coding and face animation.
Vehicle and pedestrian tracking is a key component for vision-based safety analysis which can use motion and appearance cues of road users. Since appearance-based detectors generate false alarms and provide lower speed, motion-based detectors are preferred but they work poorly when pedestrians or vehicles stop due to traffic signals. In this paper, a tracking system is proposed to track waiting and moving road users by fusion of motion and appearance cues at detection level. The enhanced optical flow tracker handles the partial occlusion problem, and it cooperates with the detection module to provide long-term tracks of vehicles and pedestrians. The system evaluation shows 13% and 43% improvement in tracking of vehicles and pedestrians respectively, and finally heat-maps show benefits of using the proposed system through the visual depiction of intersection usage.
Deep learning has been successfully applied to video anomaly detection. However, the way that deep network learn spatio-temporal features autonomously will ignore the specificity of different pattern features. Therefore, this paper focuses on how to efficiently learn deep appearance feature, introduces the idea of learning appearance information by predicting future frame, and proposes dual stream conditional generative adversarial network fusion for video abnormal behavior detection. The video frame and its corresponding optical flow image are transferred to the conditional generative adversarial network to learn the motion feature representation. In addition, inputting the video frame and its corresponding future frame to the network to generate the appearance representation complementary to motion feature. The model is only trained with normal events, therefore it is not able to generate abnormal events accurately. During the test, for the foreground moving targets, the images generated by the model are compared with the ground truth to obtain a two-stream anomaly probability distribution model based on the mean square error used to achieve the purpose of region anomaly detection. Experiments on the public datasets show that the proposed method can effectively detect and locate abnormal behaviors in the video.
A novel optical flow estimation is proposed in this paper, which addresses some issues including the credible estimation of optical flow and the prevention of over-smoothing across motion boundaries. Our main contribution is that we estimated the optical flow by a nonlinear filtering process instead of an energy minimizing process, the latter often needs the corresponding smoothing constraint to be restricted to some form of convex and differentiable entity. In this way, it avoids some restrictions needed by the regularization. So we can choose a nonlinear filter from more flexible forms, which can help to deal with flow discontinuities more efficiently in some sense. We modified and extended a scalar 2D bilateral filter to optical flow field as the wanted nonlinear filter. Qualitative and quantitative results show that the new method can produce a reliable result.
A novel approach to predict future image frame of an image sequence is being presented. First, a method to predict the future position of a moving object in an image sequence is discussed using artificial neural network (ANN). Second, optical flow concept is used for generating complete image frame by calculating velocity of each pixel on both axes. A separate ANN (both sigmoidal and radial basis function neural network) is modeled for each pixel's velocity and predicted velocity of each pixel is then mapped to its future values and image frames are generated. The quality evaluations of predicted images are measured by Canny edge detection-based image comparison metric (CIM) and mean structure similarity index measure (MSSIM). These proposed approaches are found to generate future images up to 10 images successfully.
Identification of motion in videos is a fundamental task for several computer vision problems. One of the main tools for motion identification is optical flow, which estimates the projection of the 3D velocity of the objects onto the plane of the camera. In this work, we propose a differential optical flow method based on the wave equation. The optical flow is computed by minimizing a functional energy composed by two terms: a data term based on brightness constancy and a regularization term based on energy of the wave. Flow is determined by solving a system of linear equations. The decoupling of the pixels in the solution allows solving the system by a direct or iterative approach and makes the method suitable for parallelization. We present the convergence conditions for our method since it does not converge for all the image points. For comparison purposes, we create a global video descriptor based on histograms of optical flow for the problem of action recognition. Despite its sparsity, results show that our method improves the average motion estimation, compared with classical methods. We also evaluate optical flow error measures in image sequences of a classical dataset for method comparison.
Human activity recognition from video sequences has emerged recently as pivotal research area due to its importance in a large number of applications such as real-time surveillance monitoring, healthcare, smart homes, security, behavior analysis, and many more. However, lots of challenges also exist such as intra-class variations, object occlusion, varying illumination condition, complex background, camera motion, etc. In this work, we introduce a novel feature descriptor based on the integration of magnitude and orientation information of optical flow and histogram of oriented gradients which gives an efficient and robust feature vector for the recognition of human activities for real-world environment. In the proposed approach first we computed magnitude and orientation of the optical flow separately then a local-oriented histogram of magnitude and orientation of motion flow vectors are computed using histogram of oriented gradients followed by linear combination feature fusion strategy. The resultant features are then processed by a multiclass Support Vector Machine (SVM) classifier for activity recognition. The experimental results are performed over different publically available benchmark video datasets such as UT interaction, CASIA, and HMDB51 datasets. The effectiveness of the proposed approach is evaluated in terms of six different performance parameters such as accuracy, precision, recall, specificity, F-measure, and Matthew’s correlation coefficient (MCC). To show the significance of the proposed method, it is compared with the other state-of-the-art methods. The experimental result shows that the proposed method performs well in comparison to other state-of-the-art methods.
In this paper, an efficient multi-scale image disparity estimation algorithm is proposed which estimates the local displacements needed to align different regions between a pair of images. This phase-based approach is based on fractional dual-tree complex wavelet transform (FrDTCWT). In the proposed FrDTCWT, initially we obtained the fractional Fourier transform of the image. We decomposed the fractional transformed image by dual-tree complex wavelet transform into real and imaginary parts. The complex analytic signal is obtained from the real-valued function which is obtained from this FrDTCWT. The FRDTCWT inherits the excellent mathematical properties of DTCWT and FrFT. Phase and amplitude of the image are computed from the complex analytic signal. The disparity is estimated as the optical flow using the phase difference method. The efficiency of the proposed algorithm is carried out by different experiments on synthetic as well as on realistic image sequences.
This paper proposes a novel stereo matching algorithm to solve environment sensing problems. It integrates a non-convex optical flow and Viterbi process. The non-convex optical flow employs a new adaptive weighted non-convex Total Generalized Variation (TGV) model, which can obtain sharp disparity maps. Structural similarity, total variation constraint, and a specific merging strategy are combined with the 4 bi-directional Viterbi process to improve the robustness. In the fusion of the optical flow and Viterbi process, a new occlusion processing method is incorporated in order to get more sharp disparity and more robust result. Extensive experiments are conducted to compare this algorithm with other state-of-the-art methods. Experimental results show the superiority of our algorithm.