Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Human pose estimation from video sequences has become a hot research topic in the domain of robotics and computer vision. However, existing three-dimensional (3D) pose estimation methods usually analyze individual frames, which have a low accuracy due to various human movement speed, limiting its practical application. In this paper, we propose a method for estimating 3D pose and calculating similarity from Tai Chi video sequences based on Seq2Seq network. Specifically, using 2D joint point coordinate sequence of the original image as input, our method constructs an encoder and a decoder to build a Seq2Seq network. Our method introduces an attention mechanism for weighing the input data to obtain an intermediate vector and decode it to estimate the 3D joint point sequence. Afterwards, using a template video and a target video as input, our method calculates the cost of passing through each point within the constraints to construct a cost matrix for video similarity. With the cost matrix, our method can determine the optimal path and use the correspondence of the video sequence to calculate the image similarity of the corresponding frame. The experimental data show that the proposed method can effectively improve the accuracy of 3D pose estimation, and increase the speed for video similarity calculation.
In human body pose estimation, manifold learning has been considered as a useful method with regard to reducing the dimension of 2D images and 3D body configuration data. Most commonly, body pose is estimated from silhouettes derived from images or image sequences. A major problem in applying manifold estimation to pose estimation is its vulnerability to silhouette variation caused by changes of factors such as viewpoint, person, and distance.
In this paper, we propose a novel approach that combines three separate manifolds for viewpoint, pose, and 3D body configuration focusing on the problem of viewpoint-induced silhouette variation. The biased manifold learning is used to learn these manifolds with appropriately weighted distances. The proposed method requires four mapping functions that are learned by a generalized regression neural network for robustness. Despite the use of only three manifolds, experimental results show that the proposed method can reliably estimate 3D body poses from 2D images with all learned viewpoints.
With the rapid development of computer technology, building pose estimation combined with Augmented Reality (AR) can play a crucial role in the field of urban planning and architectural design. For example, a virtual building model can be placed into a realistic scenario acquired by a Unmanned Aerial Vehicle (UAV)to visually observe whether the building can integrate well with its surroundings, thus optimizing the design of the building. In the work, we contribute a building dataset for pose estimation named BD3D. To obtain accurate building pose, we use a physical camera which can simulate realistic cameras in Unity3D to simulate UAVs perspective and use virtual building models as objects. We propose a novel neural network that combines MultiBin module with PoseNet architecture to estimate the building pose. Sometimes, the building is symmetry and ambiguity causes its different surfaces to have similar features, making it difficult for CNNs to learn the differential features between the different surfaces. We propose a generalized world coordinate system repositioning strategy to deal with it. We evaluate our network with the strategy on BD3D, and the angle error is reduced to 3∘ from 45∘. Code and dataset have been made available at: https://github.com/JellyFive/Building-pose-estimation-from-the-perspective-of-UAVs-based-on-CNNs.
This paper describes an approach for real-time preparation of grasping tasks, based on the low-order moments of the target's shape on a stereo pair of images acquired by an active vision head. The objective is to estimate the 3D position and orientation of an object and of the robotic hand, by using computationally fast and independent software components. These measurements are then used for the two phases of a reaching task: (i) an initial phase whereby the robot positions its hand close to the target with an appropriate hand orientation, and (ii) a final phase where a precise hand-to-target positioning is performed using Position-Based Visual Servoing methods.