Image anomaly detection consists in detecting images or image portions that are visually different from the majority of the samples in a dataset. The task is of practical importance for various real-life applications like biomedical image analysis, visual inspection in industrial production, banking, traffic management, etc. Most of the current deep learning approaches rely on image reconstruction: the input image is projected in some latent space and then reconstructed, assuming that the network (mostly trained on normal data) will not be able to reconstruct the anomalous portions. However, this assumption does not always hold. We thus propose a new model based on the Vision Transformer architecture with patch masking: the input image is split in several patches, and each patch is reconstructed only from the surrounding data, thus ignoring the potentially anomalous information contained in the patch itself. We then show that multi-resolution patches and their collective embeddings provide a large improvement in the model’s performance compared to the exclusive use of the traditional square patches. The proposed model has been tested on popular anomaly detection datasets such as MVTec and head CT and achieved good results when compared to other state-of-the-art approaches.
Face hallucination is to synthesize high-resolution face image from the input low-resolution one. Although many two-step learning-based face hallucination approaches have been developed, they suffer from the expensive computational cost due to the separate calculation of the global and local models. To overcome this problem, we propose a correlative two-step learning-based face hallucination approach which bridges the gap between the global model and the local model. In the global phase, we build a global face hallucination framework by combining the steerable pyramid decomposition and the reconstruction. In the residue compensation phase, based on the combination weights and constituent samples obtained in the global phase, a residue face image is synthesized by the neighbor reconstruction algorithm to compensate the hallucinated global face image with subtle facial features. The ultimate hallucinated result is synthesized by adding the residue face image to the global face image. Compared with existing methods, in the global phase, our global face image is more similar to the original high-resolution face image. Furthermore, in the residue compensation phase, we use the combination weights and constituent samples obtained in the global phase to compute the residue face image, by which the computational efficiency can be greatly improved without compromising the quality of facial details. The experimental results and comparisons demonstrate that our approach can not only generate convincible high-resolution face images efficiently, but also has high computational efficiency. Furthermore, our proposed approach can be used to restore the damaged face images in image inpainting. The efficacy of our approach is validated by recovering the damaged face images with visually good results.
Image completion is an active and interesting research area in image processing and computer graphics. Restoration and retouching of damaged areas in an undetectable form is the objective of image completion techniques. Most of the recently developed video completion methods are extensions of image completion techniques to restore the damaged frames. With respect to video completion challenges and image completion future work, we survey existing methods and introduce a new classification. The methods in each category are described in detail. In the second part of the paper, we provide a comparison and evaluation study between the most recent image completion methods qualitatively as well as quantitatively. For a fair comparison, we introduced a new dataset and evaluated four available image completion methods on the same hardware. Experimental results are conducted to highlight the strengths and drawbacks of each image completion method.
This paper presents a new algorithm for fast exemplar-based image inpainting by using image pyramid and a novel pruning scheme. The proposed technique separates the inpainting process into three major steps. First the model determines which patch belonging to the inpainting region should be filled first. Then the pruning system analyzes the selected patch and weeds out regions having no chance to be the best matching patch. Finally, it performs a fast texture synthesis based on greedy search method in the remaining regions to fill the selected patch. Compared with previous methods, the proposed algorithm is much more efficient computationally. Experiments on synthetic and natural images show the advantages of the proposed algorithm, where it is shown to compare favorably with contemporary state-of-the-art schemes.
Image inpainting removes unwanted objects from the image, signifying the original image restoration. Even though several techniques are introduced for image inpainting, but still, there are several challenging issues associated with the conventional methods regarding data loss, which are effectively handled based on the proposed approach. In this paper, we propose an effective hybrid image inpainting method that is termed as ALGDKH, which is the hybridization of Ant Lion–Gray Wolf Optimizer (ALG)-based Markov random field (MRF) modeling, deep learning, K-nearest neighbors (KNN) and the harmonic functions. The crack input image is forwarded as an input to Markov random field modeling to obtain image inpainting, where the MRF energy is minimized based on the ALG. Then, the same crack image is subjected to the Whale–MBO-based DCNN, KNN with Bhattacharya distance and Bi-harmonic function modules to obtain the inpainting results. Finally, the results from the proposed ALG-based Markov random field modeling, Whale–MBO-based DCNN, KNN with Bhattacharya distance and Bi-harmonic function modules are fused through Bayes-probabilistic fusion for the final inpainting results. The proposed method produces the maximal PSNR of 38.14dB, maximal SDME of 75.70dB and the maximal SSIM of 0.983.
Thanks to the rapid development of deep learning in recent years, image inpainting has made significant progress. As a fundamental task in the field of computer vision, many researchers are committed to exploring more efficient methods, and state-of-the-art research results prove that generative adversarial networks (GAN) have superior performance. However, due to the inherent ill-posedness of image inpainting tasks, these approaches suffer from lack of detailed information, local structural fractures or boundary artifacts. In this paper, we leverage the properties of GAN architecture to process images in more detail and more comprehensively. A novel dual U-Net GAN is designed to inpaint images, which is composed of a U-Net based generator and a U-Net-based discriminator. The former captures semantic information of different scales layer by layer and decodes it back to the original size to repair damaged images, while the latter optimizes the network by combining reconstruction loss, adversarial loss, perceptual loss and style loss. In particular, the U-Net-based discriminator allows per-pixel detail and global feedback to be provided to the generator, guaranteeing the global consistency of the inpainted image and the realism of local shapes and textures. Extensive experiments demonstrate that for different proportions of damage, the images inpainted by our proposed model have reasonable texture structure and contextual semantic information. Furthermore, the proposed model outperforms state-of-the-art models in both qualitative and quantitative comparisons. The code will be available at https://github.com/yjjswu.
The image generation and completion model complement the missing area of the image to be repaired according to the image itself or the information of the image library so that the repaired image looks very natural and difficult to distinguish from the undamaged image. The difficulty of image generation and completion lies in the reasonableness of image semantics and the clear and true texture of the generated image. In this paper, a Wasserstein generative adversarial network with dilated convolution and deformable convolution (DDC-WGAN) is proposed for image completion. A deformable offset is added based on dilated convolution, which enlarges the receptive field and provides a more stable representation of geometric deformation. Experiments show that the DDC-WGAN method proposed in this paper has better performance in image generation and complementation than the traditional generative adversarial complementation network.
Aiming at the problems of imperfect inpainting edges, mismatching inpainting content and slow training caused by large network model parameters and high requirements for image inpainting edge consistency and semantic integrity, this paper designs a gated convolution and self-attention-based pyramid network (GAP-Net), the network is based on U-Net, and it integrates the gated convolution method and the pyramid loss and changes the feature extraction strategy. In addition, we design a self-attention mechanism module and an attention transfer module for the network, designing and adding content loss and perceptual loss for the network, generating a new data distribution between generated and real images. The comparative analysis of experiment with the PEN-Net method and the Gated method is conducted in the same experimental environment. The experimental results show that our method can increase the extraction of useful information from damaged image areas by gated convolution and pyramid loss. Self-attention mechanism module and the attention transfer module can guide the conversion process of high-level semantic features to image information more accurately, and the content and perceptual loss can accelerate and improve the learning ability of the network, this method improves the repair effect and accelerates the network learning speed.
In order to address the problems of traditional inpainting algorithm models, such as the inability to automatically identify the specific location of the area to be restored, the cost of inpainting and the difficulty of inpainting, and the problems of structural and texture discontinuity and poor model stability in deep learning-based image inpainting, this paper proposes an image inpainting based on a contextual coherent attention. This paper designs a network model based on generative adversarial networks. First, to improve the global semantic continuity and local semantic continuity of images in image inpainting, a contextual coherent attention layer is added to the network; second, to solve the problems of slow convergence and insufficient training stability of the model, a cross-entropy loss function is used; finally, the trained generator is used to repair images. The experimental results are compared using PSNR and SSIM metrics, compared with the traditional GAN model, our model has a 3.782dB improvement in peak signal-to-noise ratio and a 0.025% improvement in structural similarity. The experimental results show that the image inpainting method in this paper has better performance in terms of image edge processing, pixel continuity and overall image structure.
In this paper, we present a novel automatic image completion solution in a greedy manner inspired by a primal sketch representation model. Firstly, an image is divided into structure (sketchable) components and texture (non-sketchable) components, and the missing structures, such as curves and corners, are predicted by tensor voting. Secondly, the textures along structural sketches are synthesized with the sampled patches of some known structure components. Then, using the texture completion priorities decided by the confidence term, data term and distance term, the similar image patches of some known texture components are found by selecting a point with the maximum priority on the boundary of hole region. Finally, these image patches inpaint the missing textures of hole region seamlessly through graph cuts. The characteristics of this solution include: (1) introducing the primal sketch representation model to guide completion for visual consistency; (2) achieving fully automatic completion. The experiments on natural images illustrate satisfying image completion results.
In this paper we study some nonlocal variational models for different image inpainting tasks. Nonlocal methods for denoising and inpainting have gained considerable attention due to their good performance on textured images, a known weakness of classical local methods which are performant in recovering the geometric structure of the image. We first review a general variational framework for the problem of nonlocal inpainting that exploits the self-similarity of natural images to copy information in a consistent way from the known parts of the image. We single out two particular methods depending on the information we copy: either the gray level (or color) information or its gradient. We review the main properties of the corresponding energies and their minima. Then we discuss three other applications: we consider the problem of stereo inpainting, some simple cases of video inpainting, and the problem of interpolation of incomplete depth maps knowing a reference image. Incomplete depth maps can be obtained as a result of stereo algorithms, or given for instance by Time-of-Flight cameras (in that case the interpolated result can be used to generate the images of the stereo pair). We discuss the basic algorithms to minimize the energies and we display some numerical experiments illustrating the main properties of the proposed models.
In this paper, we present a new algorithm for image inpainting using low dimensional feature space. In our method, projecting a low dimensional space from the original space is accomplished firstly using SVD, which is named low rank component, and then the missing pixels are filled in the new space. Finally, the original image is inpainted so that adaptive patch size is considered by quad-tree based on the previous step. In our algorithm, the missing pixels in the target region are estimated twice, one in low dimension feature space and another in the original space. It is noticeable that both processes estimate the unknown pixels using patch-based idea and rank lowering concept. Experimental results of this algorithm show better consistency in comparison with state-of-the-art methods.
Image inpainting is an active research area in the image processing field. The essential idea of image inpainting algorithm is to fill in the missing or damaged regions with available information from their surroundings. In this paper, we propose two image inpainting models based on the variational method. We show that the diffusion performance of the proposed models for image inpainting are superior to classical total variation (TV) inpainting model according to the physical characteristics in local coordinates. To show the effective performance of the proposed models, we apply it to restoring of scratched photos, text removal and even removal of entire objects from images.
Video inpainting aims to complete in a visually pleasing way the missing regions in video frames. Video inpainting is an exciting task due to the variety of motions across different frames. The existing methods usually use attention models to inpaint videos by seeking the damaged content from other frames. Nevertheless, these methods suffer due to irregular attention weight from spatio-temporal dimensions, thus giving rise to artifacts in the inpainted video. To overcome the above problem, Spatio-Temporal Inference Transformer Network (STITN) has been proposed. The STITN aligns the frames to be inpainted and concurrently inpaints all the frames, and a spatio-temporal adversarial loss function improves the STITN. Our method performs considerably better than the existing deep learning approaches in quantitative and qualitative evaluation.
Image inpainting is a process of reconstructing an incomplete image from the available information in a visually plausible way. In the proposed framework, existing image inpainting methods are classified in a new perspective. The information which is referred to, while reconstructing an image, is a critical factor of inpainting algorithms. Source of this information can be host image itself or an external source. The proposed framework broadly classifies inpainting algorithms into introspective and extrospective categories based on the source of information. Various parameters influencing the algorithms under these categories are identified in the proposed framework. A comprehensive list of all publicly available datasets along with the references are also summarized. Additionally, an in-depth analysis of the results obtained with the surveyed techniques is performed based on quantitative and qualitative parameters. The proposed framework aids the user in identifying the most suitable algorithm for various inpainting scenarios.
Representing the image to be inpainted in an appropriate sparse dictionary, we introduce a novel method for the filling-in of structure and texture in regions of missing image information. In the morphological component analysis (MCA) inpainting approach, a TV penalty is added to better reduce ringing artifacts. However, the incorporation of TV penalty terms leads to PDE schemes that are numerically intensive. Inspired by the works of Daubechies–Teschke and Borup–Nielsen, we replace the TV term by a term. It results in an iterative curvelet thresholding scheme for the structure image inpainting. In the whole inpainting process, an alternative approach is presented to layer inpainting. Experimental results show the performance of the algorithm.
Image inpainting is the process to fill missing pixels in the damaged image and this process has drawn more attraction and gained active and expensive research topic in recent decades, because the high quality in the image inpainting benefits a greater range of applications, like object removal, photo restoration, and so on. Inpainting of larger quality of the image needs to fill the empty regions with plausible content in the damaged image. The existing inpainting methods either fill image regions by stealing the image patches or semantically create coherent patches from the regional context. Most of the traditional models perform well on small holes images, but restoring the image with large holes still results a challenging task. To overcome such issues and to generate effective inpainting results, a proposed method named the hybrid context deep learning approach is designed in order to fill empty regions of crack images. Moreover, the proposed method is more effective by employing a hybrid optimization algorithm for training of classifier to generate a more robust and accurate inpainted result. The developed model includes two different deep learning classifiers to accomplish the process of image inpainting in such a way that the results are fused through the probabilistic model. Moreover, the proposed approach attains higher performance by the metrics such as Peak signal-to-noise ratio (PSNR), Structural Similarity Index (SSIM), Second Derivative like Measure of Enhancement (SDME), and Universal Quality Index (UQI) with the values of 38.02db, 0.867, 54.32db, and 0.864, respectively.
The fold that appeared in the micro-slice images needs to be inpainted exactly so that it can meet the requirements of the scientific experiments. The biological slice image usually consists of piecewise smooth regions with the closed loop contour which can be represented by the Bendlet function proposed in recent years. Therefore, a novel image inpainting method based on Bendlet and interval Shannon–Cosine wavelet is proposed. Since the deformation of the locust slice image is flexible, it is necessary to obtain as many feature points as possible to ensure the accuracy of the inpainting, so we introduce the curvature as a new registration element for registration. First, the homography matrix is obtained by calculating the correct feature points by our proposed registration method. Second, the fold position is located by homography matrix and inpainted by Shannon–Cosine interval wavelet interpolation. Finally, the pixel difference is eliminated through adaptive fusion. The results indicate that, in comparison to the SURF and ORB algorithms, our registration method significantly enhances the extraction of feature points, achieving a more even distribution. Furthermore, when compared to four other methods (K-SVD, BSCB, TV and Criminisi), as well as various interpolation methods such as cubic polynomial interpolation, cubic spline interpolation, and nearest neighbor interpolation, our approach consistently achieves the highest PSNR and SSIM values.
Limited by the dynamic range of the detector, saturation artifacts usually occur in optical coherence tomography (OCT) imaging for high scattering media. The available methods are difficult to remove saturation artifacts and restore texture completely in OCT images. We proposed a deep learning-based inpainting method of saturation artifacts in this paper. The generation mechanism of saturation artifacts was analyzed, and experimental and simulated datasets were built based on the mechanism. Enhanced super-resolution generative adversarial networks were trained by the clear–saturated phantom image pairs. The perfect reconstructed results of experimental zebrafish and thyroid OCT images proved its feasibility, strong generalization, and robustness.
Virtual restoration of digital copies of the human documental heritage is crucial for facilitating both the traditional work of philologists and paleographers and the automatic analysis of the contents. Here we propose a practical and fast procedure for the correction of the typically complex background of recto–verso historical manuscripts. The procedure has two main, distinctive features: it does not need for a preliminary registration of the two page sides, and it is non-invasive, as it does not alter the original appearance of the manuscript. This makes it suitable for the routinary use in the archives, and permits an easier fruition of the manuscripts, without any information being lost. In the first stage, the detection of both the primary text and the spurious strokes is performed via soft segmentation, based on the statistical decorrelation of the two recto and verso images. In the second stage, the noisy pattern is substituted with pixels that simulate the texture of the clean surrounding background, through an efficient technique of image inpainting. As shown in the experimental results, evaluated both qualitatively and quantitatively, the proposed procedure is able to perform a fine and selective removal of the degradation, while preserving other informative marks of the manuscript history.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.