Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Traditional vision registration technologies require the design of precise markers or rich texture information captured from the video scenes, and the vision-based methods have high computational complexity while the hardware-based registration technologies lack accuracy. Therefore, in this paper, we propose a novel registration method that takes advantages of RGB-D camera to obtain the depth information in real-time, and a binocular system using the Time of Flight (ToF) camera and a commercial color camera is constructed to realize the three-dimensional registration technique. First, we calibrate the binocular system to get their position relationships. The systematic errors are fitted and corrected by the method of B-spline curve. In order to reduce the anomaly and random noise, an elimination algorithm and an improved bilateral filtering algorithm are proposed to optimize the depth map. For the real-time requirement of the system, it is further accelerated by parallel computing with CUDA. Then, the Camshift-based tracking algorithm is applied to capture the real object registered in the video stream. In addition, the position and orientation of the object are tracked according to the correspondence between the color image and the 3D data. Finally, some experiments are implemented and compared using our binocular system. Experimental results are shown to demonstrate the feasibility and effectiveness of our method.
This work presents a novel approach to multimodal human action recognition by jointly using visual RGB and depth (including skeleton joint positions) data captured from depth camera. For the depth feature extraction, Local Surface Geometric Feature (LSGF) is adopted to capture the geometric appearance and postures cues. Simultaneously, the improved dense trajectory feature (IDT) is extracted from RGB modality to jointly characterize the motion, visual appearance and trajectory shape information. These features from different modalities are complementary to each other. Then a two-stage integration scheme is proposed, which incorporates the probability weights of each classifier for action recognition. The proposed approach is evaluated on four publicly available human action databases: NJUST RGB-D Action, MSR-ActionPairs, MSR-DailyAct3D, and UTD-MHAD. Experimental results demonstrate that the proposed approach outperforms or is comparable to the state-of-the-art methods.
Existing depth map-based super-resolution (SR) methods cannot achieve satisfactory results in depth map detail restoration. For example, boundaries of the depth map are always difficult to reconstruct effectively from the low-resolution (LR) guided depth map particularly at big magnification factors. In this paper, we present a novel super-resolution method for single depth map by introducing a deep feedback network (DFN), which can effectively enhance the feature representations at depth boundaries that utilize iterative up-sampling and down-sampling operations, building a deep feedback mechanism by projecting high-resolution (HR) representations to low-resolution spatial domain and then back-projecting to high-resolution spatial domain. The deep feedback (DF) block imitates the process of image degradation and reconstruction iteratively. The rich intermediate high-resolution features effectively tackle the problem of depth boundary ambiguity in depth map super-resolution. Extensive experimental results on the benchmark datasets show that our proposed DFN outperforms the state-of-the-art methods.
Depth map contains the space information of objects and is almost free from the influence of light, and it attracts many research interests in the field of machine vision used for human detection. Therefore, hunting a suitable image feature for human detection on depth map is rather attractive. In this paper, we evaluate the performance of the typical features on depth map. A depth map dataset containing various indoor scenes with human is constructed by using Microsoft's Kinect camera as a quantitative benchmark for the study of methods of human detection on depth map. The depth map is smoothed with pixel filtering and context filtering so as to reduce particulate noise. Then, the performance of five image features and a new feature is studied and compared for human detection on the dataset through theoretic analysis and simulation experiments. Results show that the new feature outperforms other descriptors.
As deep learning plays an increasing role in making medical decisions, explainability is playing an increasing role in satisfying regulatory requirements and facilitating trust and transparency in deep learning approaches. In cardiac imaging, the task of accurately assessing left-ventricular function is crucial for evaluating patient risk, diagnosing cardiovascular disease, and clinical decision making. Previous video based methods to predict ejection fraction yield high accuracy but at the expense of explainability and did not utilize the standard clinical workflow. More explainable methods that match the clinical workflow, using 2D semantic segmentation, have been explored but found to have lower accuracy. To simultaneously increase accuracy and utilize an approach that matches the standard clinical workflow, we propose a frame-by-frame 3D depth-map approach that is both accurate (mean absolute error of 6.5%) and explainable, utilizing the conventional clinical workflow with method of discs evaluation of left ventricular volume. This method is more reproducible than human evaluation and generates volume predictions that can be interpreted by clinicians and provide the opportunity to intervene and adjust the deep learning prediction.