The most important part of frame rate up-conversion (FRUC) is block matching. The geometric properties of the image were not taken into consideration in traditional block matching algorithm, so the matching result of motion estimation cannot reach the optimal. A novel FRUC algorithm based on Bandelet was proposed in this paper. The algorithm includes: Firstly, a soft threshold Bandelet transform of matching block was performed. The optimal matching block was determined through detection of direction similarity and Bandelet coefficient similarity; secondly, vector median filtering (VMF) and overlapped block motion compensation (OBMC) were carried out by adopting motion vector to realize interpolated frame algorithm. Experimental results show that the FRUC algorithm based on Bandelet can further promote the quality of FRUC.
We propose a method for pedestrian detection from aerial images captured by unmanned aerial vehicles (UAVs). Aerial images are captured at considerably low resolution, and they are often subject to heavy noise and blur as a result of atmospheric influences. Furthermore, significant changes to the appearance of pedestrians frequently occur because of UAV motion. In order to address these crucial problems, we propose a cascading classifier that concatenates a pre-trained classifier and an online learning-based classifier. We construct the first classifier using deep belief network (DBN) with an extended input layer. Unlike previous approaches that use raw images as the input layer of the DBN, we exploit multi-scale histogram of oriented gradients (MS-HOG) features. The MS-HOG enables us to supply better and richer information than low-resolution aerial images for constructing a reliable deep structure of DBN, because the dimensions of the input features can be expanded. Furthermore, the MS-HOG effectively extracts the necessary edge information while reducing trivial gradients and noise. The second classifier is based on online learning, and it uses predictions of the target appearance using UAV motions. Predicting the target appearance enables us to collect reliable training samples for the classifier’s online learning process. Experiments using aerial videos demonstrate the effectiveness of the proposed method.
To solve the uncertainty of motion for objects displayed on liquid crystal display (LCD), the motion estimation (ME) algorithm on the basis of global entirety unidirectional motion and local fast bidirectional motion (GEU-LFB) is proposed for motion-compensated temporal frame interpolation (MCTFI). Firstly, by observing the moving objects in the scene, we will obtain a set of global motion vectors (MVs). Secondly, the fast local search based on the bidirectional ME is executed. Additionally, to make up for the defect of the bidirectional ME, a method to generate the exposure mask and the occlusion mask is proposed by cleverly using the matching criterion of absolute difference and the set of global MVs. Next, the optimal MV field is smoothed by the vector median filter. Finally, the temporal interpolation frame obtained through the weighted filtering compensation method is further improved by using masks. Overall, the experimental results show that the proposed algorithm has better performance than the existing methods in terms of objective and subjective criteria. Moreover, it can solve problems of exposure and occlusion, and get better frame interpolation for the video sequence with fast moving targets.
Traditional template matching-based motion estimation is a popular but time-consuming method for vibration vision measurement. In this study, the particle swarm optimization (PSO) algorithm is improved to solve this time-consumption problem. The convergence speed of the algorithm is increased using the adjacent frames search method in the particle swarm initialization process. A flag array is created to avoid repeated calculation in the termination strategy. The subpixel positioning accuracy is ensured by applying the surface fitting method. The robustness of the algorithm is ensured by applying the zero-mean normalized cross correlation. Simulation results demonstrate that the average extraction error of the improved PSO algorithm is less than 1%. Compared with the commonly used three-step search algorithm, diamond search algorithm, and local search algorithm, the improved PSO algorithm consumes the least number of search points. Moreover, tests on real-world image sequences show good estimation accuracy at very low computational cost. The improved PSO algorithm proposed in this study is fast, accurate, and robust, and is suitable for plane motion estimation in vision measurement.
Deep learning-based video coding methods have demonstrated superior performance compared to classical video coding standards in recent years. The vast majority of the existing deep video coding (DVC) networks are based on convolutional neural networks (CNNs), and their main drawback is that since CNNs are affected by the size of the receptive field, they cannot effectively handle long-range dependencies and local detail recovery. Therefore, how to better capture and process the overall structure as well as local texture information in the video coding task is the core issue. Notably, the transformer employs a self-attention mechanism that captures dependencies between any two positions in the input sequence without being constrained by distance limitations. This is an effective solution to the problem described above. In this paper, we propose end-to-end transformer-based adaptive video coding (TAVC). First, we compress the motion vector and residuals through a compression network built on the vision transformer (ViT) and design the motion compensation network based on ViT. Second, based on the requirement of video coding to adapt to different resolution inputs, we introduce a position encoding generator (PEG) as adaptive position encoding (APE) to maintain its translation invariance across different resolution video coding tasks. The experiment shows that for multiscale structural similarity index measurement (MS-SSIM) metrics, this method exhibits significant performance gaps compared to conventional engineering codecs, such as ×264×264, ×265×265, and VTM-15.2. We also achieved a good performance improvement compared to the CNN-based DVC methods. In the case of peak signal-to-noise ratio (PSNR) evaluation metrics, TAVC also achieves good performance.
The Dynamic Pyramid is a model to solve the correspondence problem of image sequences. A robust estimation of local displacements is combined with controlled continuity constraints. At the heart of the model is the functional of an elastic membrane whose elastic constants are subject to variation. The continuity control function is derived from the tension in the displacement vector field at gray value edges. The displacement term of the functional is based on robust local binary correlations derived from the signs of the bandpass filtered images. The basic representation of the model is the pyramid: The original images are converted into Laplacian pyramids, the signs of which are the features to determine the local displacements as well as the continuity control function. The vector field is built up as a pyramid from coarse to fine, giving the final displacement vector field at the finest level.
This paper describes some issues in building a 3-D human face modeling system which mainly consists of three parts:
• Modeling human faces;
• Analyzing facial motions;
• Synthesizing facial expressions.
A variety of techniques developed for this system are described in detail in this paper. Some preliminary results of applying this system to computer animation, video sequence compression and human face recognition are also shown.
A line segment based approach for 3D motion estimation and tracking of multiple objects from a monocular image sequence is presented. Objects are described by means of 3D line segments, and their presence in the scene is associated with the detection of 2D line segments on the image plane. A change detection algorithm is applied to detect moving objects on the image plane and a Hough-based algorithm is used to individuate 2D line segments. 3D parameters of each line segment are estimated, at each time instant, by means of an extended Kalman filter (EKF), whose observations are the displacements of 2D line segment endpoints on the image plane. Results on both synthetic and real scenes are presented.
This paper presents a glove-free method for tracking hand movements using a set of 3-D models. In this approach, the hand is represented by five cylindrical models which are fit to the third phalangeal segments of the fingers. Six 3-D motion parameters for each model are calculated that correspond to the movement of the fingertips in the image plane. Trajectories of the moving models are then established to show the 3-D nature of the hand motion.
This paper presents a new Computational-RAM (C-RAM) architecture for real-time mesh-based video motion tracking. The motion tracking consists of two operations: mesh-based motion estimation and compensation. The proposed motion estimation architecture is presented in Part 1 and the proposed motion compensation architecture is presented in Part 2. The motion estimation architecture stores two frames and computes motion vectors for a regular triangular mesh structure as defined by MPEG-4 Part 2.1 The motion estimation architecture uses the block-matching algorithm (BMA) to estimate the vertical and horizontal motion vectors for each mesh node. Parallel and pipelined implementations have been used to overcome the huge computational requirements of the motion estimation process. The two frames are stored in embedded S-RAMs generated with Virage™ Memory Compiler. The proposed motion estimation architecture has been prototyped, simulated and synthesized using the TSMC 0.18 μm CMOS technology. At 100 MHz clock frequency, the proposed architecture processes one CIF video frame (i.e., 352×288 pixels) in 1.48 ms, which means it can process up to 675 frames per second. The core area of the proposed motion estimation architecture is 24.58 mm2 and it consumes 46.26 mW.
Power consumption is very critical for portable video applications. During compression, the motion estimation unit consumes the largest portion of power since it performs a huge amount of computation. Different low power architectures for implementing the full-search block-matching (FSBM) motion estimation are discussed. Also, architectural enhancements to further reduce the power consumed during FSBM motion estimation without sacrificing throughput or optimality are presented. The proposed approach achieves these power savings by disabling portions of the architecture that perform unnecessary computations. A comparison between the different architectures including our enhancements and others is presented using simulation and analytical analysis. Different benchmarks are used to test and compare the discussed architectures. Analytical and simulation results show the effectiveness of the enhancements.
A high-performance versatile motion estimation processor with regular hardware and control structure capable of MP@HL is presented. It can compute 1920 × 1080 pels in 4096 cycles within search point range 64 with eight identical processors. The presented processor has 100% hardware utilization by eliminating the computation delay between stages, and the capability to compute half pel precision motion estimation. Also, by enhancing image data reuse, the I/O bottleneck problem is alleviated. With efficient scalability, the amount of hardware and computation speed can be easily adjusted for a variety of application areas. The proposed edge effect hardware can be used for object-based coding for MPEG-4. The processor was verified with C++ and VHDL based on actual image data and implemented with 0.6μm gate array. Also, a motion estimator system with image memories and PC host interface was designed.
Success of a tracking method depends largely on choosing the suitable window size as soon as the target size changes in image sequences. To achieve this goal, we propose a fast tracking algorithm based on adaptively adjusting tracking window. Firstly, tracking window is divided into four edge subwindows, and a background subwindow around it. Then, by calculating the spatiotemporal gradient power ratios of the target in each subwindow, four proper expansion vectors are associated with any tracking window sides such that the occupancy rate of the target in tracking window should be maintained within a specified range. In addition, since temporal changing of target is evaluated in calculating these vectors, we estimate overall target displacement by sum of expansion vectors. Experimental results using various real video sequences show that the proposed algorithm successfully track an unknown textured target in real time, and is robust to dynamic occlusions in complex noisy backgrounds.
Motion estimation is used to remove interpixel redundancy in video data and block based motion estimation algorithms are widely used for it. Computation in these algorithms is reduced by limiting the number of candidate search points within the search window or simplifying the distortion measurement criterion. In literature, there are integral frame based block motion estimation algorithms which drastically reduce computation cost. However, these algorithms have a serious drawback of spurious block matching possibility, leading to poor quality results. In this manuscript, a multilevel block matching criterion based on integral frame concept is proposed to minimize this drawback. Experimental results show that an increase up to 12% in terms of PSNR (dB) has been achieved than integral frame based sum of absolute difference block sum criterion (SAD_BS) with almost same execution time. Further, in terms of ratio, proposed method has 25–26% gain over SAD_BS.
Motion estimation (ME) is a highly computationally intensive operation in video compression. Efficient ME architectures are proposed in the literature. This paper presents an efficient low computational complexity systolic architecture for full search block matching ME (FSBME) algorithm. The proposed architecture is based on one-bit transform-based full search (FS) algorithm. The proposed ME hardware architectures perform FS ME for four macroblocks (MBs) in parallel. The proposed hardware architecture is implemented in VHDL. The FSBME hardware consumes 34% of the slices in a Xilinx Vertex XC6vlx240T FPGA device with a maximum frequency of 133MHz and is capable of processing full high definition (HD) (1920×10801920×1080) frames at a rate of 60 frames per second.
Motion estimation (ME) accounts for the major part of computational complexity of any video coding standard. The diamond search (DS) algorithm is widely used as a fast search technique to perform motion estimation. In this paper, a novel architecture for the diamond search technique is proposed that efficiently handles memory addressing and reduces hardware complexity. The proposed architecture meets the speed requirements for real-time video processing without compromising the area. The design when implemented in Verilog HDL on Virtex-5 technology and synthesized using Xilinx ISE Design Suite 12.4, gives rise to a critical path delay of 3.25 ns and the equivalent area is calculated to be 3.5K gate equivalent. Working at a frequency of 308 MHz, the proposed design can process 128 CIF frames per second. So, the proposed architecture can be incorporated in a video codec targeted for commercial devices like smart-phones, camcorders and video conferencing system.
The watermarking technique is an active subject in current research used as a solution for copyright protection in multimedia documents. In this paper, we propose the first hardware invisible robust video watermarking application based on motion estimation. Since the designers of this application face many challenges, two types of architecture are performed: static and dynamic/partial reconfigurable architecture. The proposed architecture is adapted to HEVC encoded video. Two protection techniques are linked up: the digital watermarking to insert a watermark in the video, and the scrambling technique for overall video protection. The watermark embedding is treated in the horizontal and vertical components of even motion vectors. Eventually, the entire vectors are scrambled. The used watermark is a binary sequence where only one bit is inserted into the horizontal and the vertical components of motion vectors. The recommended architecture applies for slow and fast video sequence, where we use a motion estimator reconfigured according to the macro-block video movement. We also utilize a pipeline structure and a clock gating module to increase computing power and reduce power consumption. Experimental results show that the suggested static and dynamic/partial reconfigurable architecture guarantees material efficiency and superior performance in terms of frequency and power consumption.
This paper presents a stereoscopic video compression scheme using a novel fast affine motion estimation technique. A temporal scalable approach is used to achieve backward compatibility with a standard definition TV. We use an adaptive mode selection scheme from three temporal locations in both channels. Both block-based and affine-motion based approaches are used to achieve two levels of improvements with different complexities. An innovative motion estimation technique using Gauss–Newton optimization and pyramid processing is implemented to efficiently estimate affine parameters. Unlike other Gauss–Newton approaches, our search technique uses only addition, subtraction and multiplication and it converges within four iterations, which implies great complexity reduction. An efficient and robust affine motion prediction yields significant over the disparity-based approach. Part of the disparity-based approach has been tested in the rigorous MPEG-2 bitstream exchange process, and adopted in the MPEG-2 Multi-View Profile (MVP).
Partial Differential Equations (PDEs) have dominated image processing research recently. The three main reasons for their success are: first, their ability to transform a segmentation modeling problem into a partial differential equation framework and their ability to embed and integrate different regularizers into these models; second, their ability to solve PDEs in the level set framework using finite difference methods; and third, their easy extension to a higher dimensional space.
This paper is an attempt to survey and understand the power of PDEs to incorporate into geometric deformable models for segmentation of objects in 2D and 3D in still and motion imagery. The paper first presents PDEs and their solutions applied to image diffusion. The main concentration of this paper is to demonstrate the usage of regularizers in PDEs and level set framework to achieve the image segmentation in still and motion imagery. Lastly, we cover miscellaneous applications such as: mathematical morphology, computation of missing boundaries for shape recovery and low pass filtering, all under the PDE framework. The paper concludes with the merits and the demerits of PDEs and level set-based framework for segmentation modeling. The paper presents a variety of examples covering both synthetic and real world images.
Video watermarks must maintain picture quality and withstand image processing, and resolving the tradeoff between these conflicting requirements has been one of the central problems in research on video watermarking. This paper shows that watermarks are less perceptible where picture contents are moving and proposes a new criterion for measuring watermark imperceptibility from motion vectors and deformation quantities. It also proposes a watermarking method that uses this criterion to allocate watermarks to picture areas adaptively. Experimental evaluation showed that the proposed method reduces degradation in picture quality and that, for the same picture quality after MPEG encoding and decoding, between 30% and 40% more watermarks can be embedded, improving detection reliability after MPEG processing by an average of 9.5%. The proposed method would be most effective when used with MPEG encoders because they already have the necessary motion estimation functions.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.