Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Traditional fingerprint verifications use single image for matching. However, the verification accuracy cannot meet the need of some application domains. In this paper, we propose to use videos for fingerprint verification. To take full use of the information contained in fingerprint videos, we present a novel method to use the dynamic as well as the static information in fingerprint videos. After preprocessing and aligning processes, the Inclusion Ratio of two matching fingerprint videos is calculated and used to represent the similarity between these two videos. Experimental results show that video-based method can access better accuracy than the method based on single fingerprint.
An entire industry has been developed around keyword optimization for buyers of advertising space. However, the social media landscape has shifted to photo-driven behaviors, and there is a need to overcome the challenge of analyzing the large amount of visual data that users post on the internet. We will address this analysis by providing a review on how to measure image and video interestingness and memorability from content that is tacked in real time on social networks. We will investigate state-of-the-art methods that are used to analyze social media images and present experiments that were performed to obtain comparable results based on the studied proposals and to determine which are the best characteristics and classifiers. Finally, we will discuss future research directions that could be beneficial to both users and companies.
This paper presents a video transmission system for scalable High-Efficiency Video Coding (HEVC) videos using a 4G standard’s physical layer. SHVC, the scalable HEVC is used to compress the different layers of videos into binary files. The resultant binary files are easily transportable over any network thus solving many issues mainly related to videos with high resolutions. Three scenarios are studied and simulated at first, namely, Single Input Single Output (SISO), Multiple Input Single Output (MISO) and MIMO. Since the MIMO scenario offers the best results, it is considered in the implementation of the system on Field Programmable Gate Array (FPGA) using Xilinx System Generator (XSG). A Simulink model is developed under Matlab to simulate the video transmission scenarios using the WIMAX physical layer. Then, the MIMO system is implemented using a Zed-Board to co-simulate the video transmission in real-time and which allows a successful reception of the video sequences.
High-Efficiency Video Coding (HEVC) has a higher coding efficiency, its encoding performance must be increased to keep up with the expanding number of multimedia applications. Therefore, this paper proposes a novel Rectified Linear Unit-Bidirectional Long Short-Term Memory-based Tree Social Relations Optimization (ReLU-BiLSTM-based TSRO) method to enhance the quality of video transmission. The significant objective of our proposed method aims in enhancing the standards of entropy encoding process in HEVC. Here, context-adaptive binary arithmetic coding (CABAC) framework which is prevalent and an improved form of entropy coding model is utilized in HEVC standards. In addition to this, the performances of the proposed method are determined by evaluating various measures such as mean square error, cumulative distribution factor, compression ratio, peak signal-to-noise ratio (PSNR) and bit error rate. Finally, the proposed method is examined with five different sequences of video from football, tennis, garden, mobile and coastguard. The performances of the proposed method are compared with various approaches, and the result analysis shows that the proposed method attained minimum mean square error (MSE) loss with maximum PSNR rate.
This work studies collective intelligence behavior of Web users that share and watch video content. Accordingly, it is proposed that the aggregated users' video activity exhibits characteristic patterns. Such patterns may be used in order to infer important video scenes leading thus to collective intelligence concerning the video content. To this end, experimentation is based on users' interactions (e.g., pause, seek/scrub) that have been gathered in a controlled user experiment with information-rich videos. Collective information seeking behavior is then modeled by means of the corresponding probability distribution function. Thus, it is argued that the bell-shaped reference patterns are shown to significantly correlate with predefined scenes of interest for each video, as annotated by the users. In this way, the observed collective intelligence may be used to provide a video-segment detection tool that identifies the importance of video scenes. Accordingly, both a stochastic and a pattern matching approach are applied on the users' interactions information. The results received indicate increased accuracy in identifying the areas selected by users as having high importance information. In practice, the proposed techniques might improve both navigation within videos on the web as well as video search results with personalised video thumbnails.
In response to the fast propagation of videos on the Internet, Content-Based Video Retrieval (CBVR) was introduced to help users find their desired items. Since most videos concern humans, human action retrieval was introduced as a new topic in CBVR. Most human action retrieval methods represent an action by extracting and describing its local features as more reliable than global ones; however, these methods are complex and not very accurate. In this paper, a low-complexity representation method that more accurately describes extracted local features is proposed. In this method, each video is represented independently from other videos. To this end, the motion information of each extracted feature is described by the directions and sizes of its movements. In this system, the correspondence between the directions and sizes of the movements is used to compare videos. Finally, videos that correspond best with the query video are delivered to the user. Experimental results illustrate that this method can outperform state-of-the-art methods.
In the last few years peer-to-peer (P2P) systems have gained ground for multipoint video content distribution over IP networks. P2P technologies give new opportunities to define an efficient multimedia streaming application, but at the same time they involve a set of technical challenges and issues due to the best-effort service offered by the underlying Internet, and its dynamic and heterogeneous nature. Stringent requirements in terms of end-to-end delay for real-time applications motivates the choice of a tree-structured topology against other topologies that have been introduced in the last research works, but mainly aimed at non-real-time services like video on demand and live streaming. The target of this paper is to present a platform for multipoint multimedia transmission based on a tree overlay network with jitter control and to show through experiments on real environment that our platform performs better than a traditional tree overlay network system in terms of PSNR, frame loss and playout frozen time.
In this paper, we present a novel scheme on video content representation by exploring the spatio-temporal information. A pseudo-object-based shot representation containing more semantics is proposed to measure shot similarity and force competition approach is proposed to group shots into scene based on content coherences between shots. Two content descriptors, color objects: Dominant Color Histograms (DCH) and Spatial Structure Histograms (SSH), are introduced. To represent temporal content variations, a shot can be segmented into several subshots that are of coherent content, and shot similarity measure is formulated as subshot similarity measure that serves to shot retrieval. With this shot representation, scene structure can be extracted by analyzing the splitting and merging force competitions at each shot boundary. Experimental results on real-world sports video prove that our proposed approach for video shot retrievals achieve the best performance on the average recall (AR) and average normalized modified retrieval rank (ANMRR), and Experiment on MPEG-7 test videos achieves promising results by the proposed scene extraction algorithm.
Based on electroencephalography (EEG) and video data, we propose a multimodal affective analysis approach in this study to examine the affective states of university students. This method is based on the findings of this investigation. The EEG signals and video data were obtained from 50 college students experiencing various emotional states, and then they were processed in great detail. The EEG signals are pre-processed to extract their multi-view characteristics. Additionally, the video data were processed by frame extraction, face detection, and convolutional neural network (CNN) operations to extract features. We take a feature splicing strategy to merge EEG and video data to produce a time series input to realize the fusion of multimodal features. This allows us to realize the fusion of multimodal features. In addition, we developed and trained a model for the classification of emotional states based on a long short-term memory network (LSTM). With the help of cross-validation, the experiments were carried out by dividing the dataset into a training set and a test set. The model’s performance was evaluated with the help of four metrics: accuracy, precision, recall, and F1-score. When compared to the single-modal method of sentiment analysis, the results demonstrate that the multimodal approach, which combines EEG and video, demonstrates considerable advantages in terms of sentiment detection. Specifically, the accuracy obtained from the multimodal approach is significantly higher. As part of its investigation, the study also investigates the respective contributions of EEG and video aspects to emotion detection. It discovers that these features complement each other in a variety of emotional states and have the potential to improve the overall recognition results. The multimodal sentiment analysis method that is based on LSTM offers a high level of accuracy and robustness when it comes to recognizing the affective states of college students. This is especially essential for enhancing the quality of education and providing support for mental health.
A video segmentation method is proposed in this paper. For abrupt cut detection, inter-frame similarities are computed using gray-level and edge histograms and a cut is declared when the similarities are under the predetermined threshold value. Gradual shot boundary detection is decided based on the similarities between the current frame and the previous shot boundary frame. Correlation coefficients are used to obtain universal threshold values, which are applied to various video data. Experimental results show that the proposed method provides 95% recall and 80% precision rates for abrupt cuts, and 83% recall and 54% precision rates for gradual changes.
Objective: To compare the effectiveness of mobile video-guided home exercise program and standard paper-based home exercise program.
Methods: Eligible participants were randomly assigned to either experimental group with mobile video-guided home exercise program or control group with home exercise program in a standard pamphlet for three months. The primary outcome was exercise adherence. The secondary outcomes were self-efficacy for exercise by Self-Efficacy for Exercise (SEE) Scale; and functional outcomes including mobility level by Modified Functional Ambulatory Category (MFAC) and basic activities of daily living (ADL) by Modified Barthel Index (MBI). All outcomes were captured by phone interviews at 1 day, 1 month and 3 months after the participants were discharged from the hospitals.
Results: A total of 56 participants were allocated to the experimental group (n=27) and control group (n=29). There were a significant between-group differences in 3-months exercise adherence (experimental group: 75.6%; control group: 55.2%); significant between-group differences in 1-month SEE (experimental group: 58.4; control group: 43.3) and 3-month SEE (experimental group: 62.2; control group: 45.6). For functional outcomes, there were significant between-group differences in 3-month MFAC gain (experimental group: 1.7; control group: 1.0). There were no between-group differences in MBI gain.
Conclusion: The use of mobile video-guided home exercise program was superior to standard paper-based home exercise program in exercise adherence, SEE and mobility gain but not basic ADL gain for patients recovering from stroke.
This paper proposes a personal learning assistant called LORAMS (Link of RFID and Movies System), which supports learners with a system to share and reuse learning experiences by linking movies to environmental objects. We assume that every object has RFID tags and mobile devices have a RFID reader and can record a video anytime and anyplace. By scanng RFID tags of real objects, LORAMS can provide only video segments that include the objects. Also LORAMS recommends similar videos to be compared. In LORAMS, the video recording and RFID tagging are used purposely to support further teaching or learning rather than "just record it and use it in some day". We think that LORAMS can be applied to various kinds of domains that employ several kinds of real objects and vary the results depending on the combination of the objects; for example, cooking, checking upon cars such as oils, battery, and tires, surgery operations and chemical bioreactor experimentations. An evaluation was conducted in the context of cooking. The subjects could easily find the difference between their videos and expert's videos and improve how to cook fried rice.
Video scene detection is the task of dividing a video into semantic sections. To perform this fundamental task, we propose a novel and effective method for temporal grouping of scenes using an arbitrary set of features computed from the video. We formulate the task of video scene detection as a generic optimization problem to optimally group shots into scenes, and propose an efficient procedure for solving the optimization problem based on a novel dynamic programming scheme. This unique formulation directly results in a temporally consistent segmentation, and has the advantage of being parameter-free, making it applicable across various domains. We provide detailed experimental results, showing that our algorithm outperforms current state-of-the-art methods. To assess the comprehensiveness of this method even further, we present experimental results testing different types of modalities and their applicability in this formulation.
In the rapidly evolving landscape of digital platforms, the need for optimizing media representations to cater to various aspect ratios is palpable. In this paper, we pioneer an approach that utilizes object detection, scene detection, outlier detection, and interpolation for smart cropping. Using soccer as a case study, our primary goal is to capture the frame salience using object (player and ball) detection and tracking using AI models. To improve the object detection and tracking, we rely on scene understanding and explore various outlier detection and interpolation techniques. Our pipeline, called SmartCrop, is efficient, and supports various configurations for object tracking, interpolation, and outlier detection to find the best point-of-interest to be used as the cropping center of the video frame. An objective evaluation of the performance of individual pipeline components has validated our proposed architecture. Moreover, a crowdsourced subjective user study, assessing the alternative approaches for cropping from 16:9 to 1:1 and 9:16 aspect ratios, confirms that our proposed approach increases the end-user quality of experience.
A novel technique for reconstructing fluid surface from video is introduced. Both fluid motion vectors as well as Lattice Boltzmann Method (LBM) are employed in the study. Region-based correlation method is used to initialize motion vectors field, after clustering fluid motion vector results can be obtained. Then the height geometry information of fluid surface can be calculated from fluid motion vector further. At last, the distribution of fluid particle is interpolated and the height field can be further refined. Reconstruction results are demonstrated with several challenge videos. The experimental results show that the method is convenient and efficient. The calculation results can reflect the characteristics of the fluid movement, and it is a valid method for reconstructing fluid surface.
The objective of this chapter is a holistic view of aesthetics, ethics, and neuroaesthetics. After a few introductory case studies, aesthetics is systematically introduced as a philosophical subdiscipline. This perspective is then expanded from aesthetics to neuroaesthetics. Using various art forms as well as current media formats, the aspects of beauty and ugliness are discussed, and aesthetic properties are expanded to include ethical implications. These can be expressed through ideals of beauty and the compulsion for body transformation. This perspective is then expanded from aesthetics to neuroaesthetics. From this point of view of art, the so-called golden ratio will play a central role. It is shown how representations affect people and what ethical implications are associated with the effects. Therefore, this chapter first has to look at art from the perspective of neuroaesthetics and then consider the ethical aspects of the beautiful and the ugly. The considerations lead to a brief discussion of Socrates’s three sieves.