Please login to be able to save your searches and receive alerts for new content matching your search criteria.
This paper presents a framework that creates background, foreground and a temporal summarization of the motions in a scene. The method is based on the Dominant Motion Assumption (DMA), where the background has a parametric motion and occupies the main part of the scene. Under this assumption, we present a robust optical flow based method to extract the moving parts of the scene using the clustering capabilities of mixtures of Gaussians. A general mosaicing method to summarize the background, the foreground and the trajectories of objects in the scene is also presented.
With the rapid growth of online videos, it is crucial to generate overviews of videos to help audiences make viewing decisions and save time. Video summarization and video captioning are two of the most common solutions. In this paper, we proposed a new solution in the form of a series of scene-person pairs generated from our proposed video description scheme. This new formation takes substantially less time than watching video summaries and is more acceptable than video captions. In addition, our method can be generalized to different types of videos. We also proposed a face clustering method and a scene detection method. The experimental results indicate that our methods outperform other state-of-the-art methods and are highly generalizable. As an example, a demo application is developed to demonstrate the proposed description scheme.
Constant working surveillance cameras in public places, such as airports and banks, produce huge amount of video data. Faces in such videos can be extracted in real time. However, most of these detected faces are either redundant or useless. Redundant information adds computational costs to facial analysis systems and useless data makes the final results of such systems noisy, unstable, and erroneous. Thus, there is a need for a mechanism to summarize the original video sequence to a set of the most expressive images of the sequence. The proposed system in this paper uses a face quality assessment technique for this purpose. The summarized results of this technique have been used in three different facial analysis systems and the experimental results on real video sequences are promising.
Storyboard consisting of key-frames is a popular format of video summarization as it helps in efficient indexing, browsing and partial or complete retrieval of video. In this paper, we have presented a size constrained storyboard generation scheme. Given the shots i.e. the output of the video segmentation process, the method has two major steps: extraction of appropriate key-frame(s) from each shot and finally, selection of a specified number of key-frames from the set thus obtained. The set of selected key-frames should retain the variation in visual content originally possessed by the video. The number of key-frames or representative frames in a shot may vary depending on the variation in its visual content. Thus, automatic selection of suitable number of representative frames from a shot still remains a challenge. In this work, we propose a novel scheme for detecting the sub-shots, having consistent visual content, from a shot using Wald–Wolfowitz runs test. Then from each sub-shot a frame rendering the highest fidelity is extracted as key-frame. Finally, a spanning tree based novel method is proposed to select a subset of key-frames having specific cardinality. Chronological arrangement of such frames generates the size constrained storyboard. Experimental result and comparative study show that the scheme works satisfactorily for a wide variety of shots. Moreover, the proposed technique rectifies mis-detection error, if any, incurred in video segmentation process. Similarly, though not implemented, the proposed hypothesis test has ability to rectify the false-alarm in shot detection if it is applied on pair of adjacent shots.
With the recent growth of research related to ambient intelligence, automated capture and retrieval of experiences at home is now a topic within reach of investigation. We present a system for retrieval and summarization of continuously captured multimedia data from Ubiquitous Home, a two-room house consisting of a large number of cameras and microphones. Data from pressure based sensors on the floor are analyzed to segment footsteps of different persons. Video and audio handover are implemented to retrieve continuous video streams corresponding to moving persons. An adaptive algorithm based on the rate of footsteps summarizes these video streams. Audio segmentation followed by sound source localization is used for video retrieval based on sounds. An experiment, during which a family lived in this house for twelve days, was conducted. The system was evaluated by the residents who used the system for retrieving their own experiences; we report and discuss the results.