In this paper, a novel method termed Multi-Instance Dictionary Learning (MIDL) is presented for detecting abnormal events in crowded video scenes. With respect to multi-instance learning, each event (video clip) in videos is modeled as a bag containing several sub-events (local observations); while each sub-event is regarded as an instance. The MIDL jointly learns a dictionary for sparse representations of sub-events (instances) and multi-instance classifiers for classifying events into normal or abnormal. We further adopt three different multi-instance models, yielding the Max-Pooling-based MIDL (MP-MIDL), Instance-based MIDL (Inst-MIDL) and Bag-based MIDL (Bag-MIDL), for detecting both global and local abnormalities. The MP-MIDL classifies observed events by using bag features extracted via max-pooling over sparse representations. The Inst-MIDL and Bag-MIDL classify observed events by the predicted values of corresponding instances. The proposed MIDL is evaluated and compared with the state-of-the-art methods for abnormal event detection on the UMN (for global abnormalities) and the UCSD (for local abnormalities) datasets and results show that the proposed MP-MIDL and Bag-MIDL achieve either comparable or improved detection performances. The proposed MIDL method is also compared with other multi-instance learning methods on the task and superior results are obtained by the MP-MIDL scheme.
Computed tomography (CT) approach is extensively utilized in clinical diagnoses. However, X-ray residue in human body may introduce somatic damage such as cancer. Owing to radiation risk, research has focused on the radiation exposure distributed to patients through CT investigations. Therefore, low-dose CT has become a significant research area. Many researchers have proposed different low-dose CT reconstruction techniques. But, these techniques suffer from various issues such as over smoothing, artifacts, noise, etc. Therefore, in this paper, we have proposed a novel integrated low-dose CT reconstruction technique. The proposed technique utilizes global dictionary-based statistical iterative reconstruction (GDSIR) and adaptive dictionary-based statistical iterative reconstruction (ADSIR)-based reconstruction techniques. In case the dictionary (D) is predetermined, then GDSIR can be used and if D is adaptively defined then ADSIR is appropriate choice. The gain intervention-based filter is also used as a post-processing technique for removing the artifacts from low-dose CT reconstructed images. Experiments have been done by considering the proposed and other low-dose CT reconstruction techniques on well-known benchmark CT images. Extensive experiments have shown that the proposed technique outperforms the available approaches.
Sparse representation based classification has led to interesting image recognition results, while the dictionary used for sparse coding plays a key role in it. This paper presents a novel supervised structure dictionary learning (SSDL) algorithm to learn a discriminative and block structure dictionary. We associate label information with each dictionary item and make each class-specific sub-dictionary in the whole structured dictionary have good representation ability to the training samples from the associated class. More specifically, we learn a structured dictionary and a multiclass classifier simultaneously. Adding an inhomogeneous representation term to the objective function and considering the independence of the class-specific sub-dictionaries improve the discrimination capabilities of the sparse coordinates. An iteratively optimization method be proposed to solving the new formulation. Experimental results on four face databases demonstrate that our algorithm outperforms recently proposed competing sparse coding methods.
Sparse representation theory has attracted much attention, and has been successfully used in image super-resolution (SR) reconstruction. However, it could only provide the local prior of image patches. Field of experts (FoE) is a way to develop the generic and expressive prior of the whole image. The algorithm proposed in this paper uses the FoE model as the global constraint of SR reconstruction problem to pre-process the low-resolution image. Since a single dictionary could not accurately represent different types of image patches, our algorithm classifies the sample patches composed of pre-processed image and high-resolution image, obtains the sub-dictionaries by training, and adaptively selects the most appropriate sub-dictionary for reconstruction according to the pyramid histogram of oriented gradients feature of image patches. Furthermore, in order to reduce the computational complexity, our algorithm makes use of edge detection, and only applies SR reconstruction based on sparse representation to the edge patches of the test image. Nonedge patches are directly replaced by the pre-processing results of FoE model. Experimental results show that our algorithm can effectively guarantee the quality of the reconstructed image, and reduce the computation time to a certain extent.
This paper proposes a new model for speaker verification by employing kurtosis statistical method based on sparse coding of human auditory system. Since only a small number of neurons in primary auditory cortex are activated in encoding acoustic stimuli and sparse independent events are used to represent the characteristics of the neurons. Each individual dictionary is learned from individual speaker samples where dictionary atoms correspond to the cortex neurons. The neuron responses possess statistical properties of acoustic signals in auditory cortex so that the activation distribution of individual speaker’s neurons is approximated as the characteristics of the speaker. Kurtosis is an efficient approach to measure the sparsity of the neuron from its activation distribution, and the vector composed of the kurtosis of every neuron is obtained as the model to characterize the speaker’s voice. The experimental results demonstrate that the kurtosis model outperforms the baseline systems and an effective identity validation function is achieved desirably.
This paper aims to propose a candidate solution to the challenging task of single-image blind super-resolution (SR), via extensively exploring the potentials of learning-based SR schemes in the literature. The task is formulated into an energy functional to be minimized with respect to both an intermediate super-resolved image and a nonparametric blur-kernel. The functional includes a so-called convolutional consistency term which incorporates a nonblind learning-based SR result to better guide the kernel estimation process, and a bi-L0-L2-norm regularization imposed on both the super-resolved sharp image and the nonparametric blur-kernel. A numerical algorithm is deduced via coupling the splitting augmented Lagrangian (SAL) and the conjugate gradient (CG) method. With the estimated blur-kernel, the final SR image is reconstructed using a simple TV-based nonblind SR method. The proposed blind SR approach is demonstrated to achieve better performance than [T. Michaeli and M. Irani, Nonparametric Blind Super-resolution, in Proc. IEEE Conf. Comput. Vision (IEEE Press, Washington, 2013), pp. 945–952.] in terms of both blur-kernel estimation accuracy and image ehancement quality. In the meanwhile, the experimental results demonstrate surprisingly that the local linear regression-based SR method, anchored neighbor regression (ANR) serves the proposed functional more appropriately than those harnessing the deep convolutional neural networks.
Enhancing the image to remove noise, preserving the useful features and edges are the most important tasks in image analysis. In this paper, Significant Cluster Identification for Maximum Edge Preservation (SCI-MEP), which works in parallel with clustering algorithms and improved efficiency of the machine learning aptitude, is proposed. Affinity propagation (AP) is a base method to obtain clusters from a learnt dictionary, with an adaptive window selection, which are then refined using SCI-MEP to preserve the semantic components of the image. Since only the significant clusters are worked upon, the computational time drastically reduces. The flexibility of SCI-MEP allows it to be integrated with any clustering algorithm to improve its efficiency. The method is tested and verified to remove Gaussian noise, rain noise and speckle noise from images. Our results have shown that SCI-MEP considerably optimizes the existing algorithms in terms of performance evaluation metrics.
Aiming at the problem of insufficient detail retention in multimodal medical image fusion (MMIF) based on sparse representation (SR), an MMIF method based on density peak clustering and convolution sparse representation (CSR-DPC) is proposed. First, the base layer is obtained based on the registered input image by the averaging filter, and the original image minus the base layer to obtain the detail layer. Second, for retaining the details of the fused image, the detail layer image is fused by CSR to obtain the fused detail layer image, then the base layer image is segmented into several image blocks, and the blocks are clustered by using DPC to obtain some clusters, and each class cluster is trained to obtain a sub-dictionary, and all the sub-dictionaries are fused to obtain an adaptive dictionary. The sparse coefficient is fused through the learned adaptive dictionary, and the fused base layer image is obtained through reconstruction. Finally, fusing the detail layer and the base layer and reconstructing them forms the ultimate fused image. Experiments show that compared to the state-of-the-art two multi-scale transformation methods and five SR methods, the proposed method(CSR-DPC) outperforms the other methods in terms of the image details, the visual quality and the objective evaluation index, which can be helpful for clinical diagnosis and adjuvant treatment.
Face recognition is widely used and is one of the most challenging tasks in computer vision. In recent years, many face recognition methods based on dictionary learning have been proposed. However, most methods only focus on the resolution of the original image, and the change of resolution may affect the recognition results when dealing with practical problems. Aiming at the above problems, a method of multi-resolution dictionary learning combined with sample reverse representation is proposed and applied to face recognition. First, the dictionaries associated with multiple resolution images are learnt to obtain the first representation error. Then different auxiliary samples are generated for each test sample, and a dictionary consisted of test sample, auxiliary samples, and other classes of training samples is established to sequentially represent all training samples at this resolution, and to obtain the second representation error. Finally, a weighted fusion scheme is used to obtain the ultimate classification result. Experimental results on four widely used face datasets show that the proposed method achieves better performance and is effective for resolution change.
A common study area in anomaly identification is industrial images anomaly detection based on texture background. The interference of texture images and the minuteness of texture anomalies are the main reasons why many existing models fail to detect anomalies. We propose a strategy for anomaly detection that combines dictionary learning and normalizing flow based on the aforementioned questions. The two-stage anomaly detection approach that is already in use is enhanced by our method. In order to improve baseline method, this research adds normalizing flow in representation learning and combines deep learning and dictionary learning. Improved algorithms have exceeded 95% detection accuracy on all MVTec AD texture type data after experimental validation. It shows strong robustness. The baseline method’s detection accuracy for the Carpet data was 67.9%. The paper was upgraded, raising the detection accuracy to 99.7%.
Dictionary learning (DL) algorithms have shown very good performance in face recognition. However, conventional DL algorithms exploit only the training samples to obtain the dictionary and totally neglect the test sample in the learning procedure. As a result, if DL is associated with the linear representation of test sample, DL may be able to perform better in classifying the test samples than conventional DL algorithms. In this paper, we propose a test sample oriented dictionary learning (TSODL) algorithm for face recognition. We combine the linear representation (including the l0-norm, l1-norm and l2-norm) of a test sample and the basic model of DL to learn a single dictionary for each test sample. Thus, it can simultaneously obtain the dictionary and representation coefficients of the test sample by minimizing only one objective function. In order to make the learning procedure more efficient, we initialize a dictionary for the new test sample by selecting from the dictionaries of previous test samples. The experimental results show that the TSODL algorithm can classify test samples more accurately than some of the state-of-the-art DL and sparse coding algorithms by using a linear classifier method on three public face databases.
Due to the capacity of compressing and recovering signal with low energy consumption, compressive sensing (CS) has drawn considerable attention in wireless telemonitoring of electrocardiogram (ECG) signals. However, most existing CS methods are designed for reconstructing single channel signal, and hence difficult to reconstruct multichannel ECG signals. In this paper, a spatio-temporal sparse model-based algorithm is proposed for the reconstruction of multichannel ECG signals by not only exploiting the temporal correlation in each individual channel signal, but also the spatial correlation among signals from different channels. In addition, a dictionary learning (DL) approach is developed to enhance the performance of the proposed reconstruction algorithm by using the sparsity of ECG signals in some transformed domain. The approach determines a dictionary by learning local dictionaries for each channel and merging them to form a global dictionary. Extensive simulations were performed to validate the proposed algorithms. Simulation results show that the proposed reconstruction algorithm has a better performance in recovering multichannel ECG signals as compared to the benchmarking methods. Moreover, the reconstruction performance of the algorithm can be further improved by using a dictionary matrix, which is obtained from the proposed DL algorithm.
Modeling real dynamical systems is an important challenge in many areas of science. Extracting governing equations of systems from their time-series is a possible solution for such a challenge. In this paper, we use the sparse recovery and dictionary learning to extract governing equations of a system with parametric basis functions. In this algorithm, the assumption of sparsity in the functions of dynamical equations is used. The proposed algorithm is applied to different types of discrete and continuous nonlinear dynamical systems to show the generalization ability of this method. On the other hand, transition from one dynamical regime to another is an important concept in studying real world complex systems like biological and climate systems. Lyapunov exponent is an early warning index. It can predict bifurcation points in dynamical systems. Computation of Lyapunov exponent is a major challenge in its application in real systems, since it needs long time data to be accurate. In this paper, we use the predicted governing equation to generate long time-series, which is needed for Lyapunov exponent calculation. So the proposed method can help us to predict bifurcation points by accurate calculation of Lyapunov exponents.
Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.
In this paper, we propose sparse coding-based approaches for segmentation of tumor regions from magnetic resonance (MR) images. Sparse coding with data-adapted dictionaries has been successfully employed in several image recovery and vision problems. The proposed approaches obtain sparse codes for each pixel in brain MR images considering their intensity values and location information. Since it is trivial to obtain pixel-wise sparse codes, and combining multiple features in the sparse coding setup is not straight-forward, we propose to perform sparse coding in a high-dimensional feature space where non-linear similarities can be effectively modeled. We use the training data from expert-segmented images to obtain kernel dictionaries with the kernel K-lines clustering procedure. For a test image, sparse codes are computed with these kernel dictionaries, and they are used to identify the tumor regions. This approach is completely automated, and does not require user intervention to initialize the tumor regions in a test image. Furthermore, a low complexity segmentation approach based on kernel sparse codes, which allows the user to initialize the tumor region, is also presented. Results obtained with both the proposed approaches are validated against manual segmentation by an expert radiologist, and it is shown that proposed methods lead to accurate tumor identification.
Inspite of technological advancement, inherent processing capability of current age sensors limits the desired details in the acquired image for variety of remote sensing applications. Pan-sharpening is a prominent scheme to integrate the essential spatial details inferred from panchromatic (PAN) image and the desired spectral information of multispectral (MS) image. This paper presents an effective two-stage pan-sharpening method to produce high resolution multispectral (HRMS) image. The proposed method is based on the premise that the HRMS image can be formulated as an amalgam of spectral and spatial components. The spectral components are estimated by processing the interpolated MS image with a filter approximated with modulation transfer function (MTF) of the sensor. Sparse representation theory is adapted to construct the spatial components. The high-frequency details extracted from the PAN image and its low resolution variant are utilized to construct dual dictionaries. The dictionaries are jointly learned by an efficient training algorithm to enhance the adaptability. The hypothesis of sparse coefficients invariance over scales is also incorporated to reckon the appropriate spatial information. Further, an iterative filtering mechanism is developed to enhance the quality of fused image. Four distinct datasets generated from QuickBird, IKONOS, Pléiades and WorldView-2 sensors are used for experimentation. The comprehensive assessment at reduced-scale and full-scale persuade the effectiveness of proposed method in the retention of spectral information and intensification of the spatial details.
This study proposes an image denoising algorithm based on sparse representation and Principal Component Analysis (PCA). The proposed algorithm includes the following steps. First, the noisy image is divided into overlapped 8×8 blocks. Second, the discrete cosine transform is applied as a dictionary for the sparse representation of the vectors created by the overlapped blocks. To calculate the sparse vector, the orthogonal matching pursuit algorithm is used. Then, the dictionary is updated by means of the PCA algorithm to achieve the sparsest representation of vectors. Since the signal energy, unlike the noise energy, is concentrated on a small dataset by transforming into the PCA domain, the signal and noise can be well distinguished. The proposed algorithm was implemented in a MATLAB environment and its performance was evaluated on some standard grayscale images under different levels of standard deviations of white Gaussian noise by means of peak signal-to-noise ratio, structural similarity indexes, and visual effects. The experimental results demonstrate that the proposed denoising algorithm achieves significant improvement compared to dual-tree complex discrete wavelet transform and K-singular value decomposition image denoising methods. It also obtains competitive results with the block-matching and 3D filtering method, which is the current state-of-the-art for image denoising.
In recent years, considerable efforts have been made in the research of sparse representation for signals over overcomplete dictionaries. The dictionaries can be either pre-specified transforms or designed by learning from a set of training signals. In the paper, the dictionary learning problem was extended into a quadratic programming framework. A projected gradient with line search method was presented for solving this large-scale box-constrained quadratic program. The non-negative dictionary learned using this method was applied to image de-noising. Experimental results demonstrated that this learning-based method had better performance than the wavelet-based, the variation-based and the K-SVD methods.
For face recognition, conventional dictionary learning (DL) methods have some disadvantages. First, face images of the same person vary with facial expressions and pose, illumination and disguises, so it is hard to obtain a robust dictionary for face recognition. Second, they don’t cover important components (e.g., particularity and disturbance) completely, which limit their performance. In the paper, we propose a novel robust and discriminative DL (RDDL) model. The proposed model uses sample diversities of the same face image to learn a robust dictionary, which includes class-specific dictionary atoms and disturbance dictionary atoms. These atoms can well represent the data from different classes. Discriminative regularizations on the dictionary and the representation coefficients are used to exploit discriminative information, which improves effectively the classification capability of the dictionary. The proposed RDDL is extensively evaluated on benchmark face image databases, and it shows superior performance to many state-of-the-art dictionary learning methods for face recognition.
The effectiveness of machine learning approaches depends on the quality of the data representation. In fact, some representations may mislead such learning approaches upon concealing relevant explanatory variables. Although feature engineering, that utilizes domain knowledge and/or expert supervision, yields typical data representation techniques, generic unsupervised feature learning represents an even more objective alternative to determine relevant attributes and generate optimal feature spaces. In this paper, we propose a new fuzzy unsupervised feature learning approach (FUL) that automatically derives new features by revealing the intrinsic structure of the data. In fact, FUL exploits the clusters and the associated fuzzy memberships generated by a fuzzy C-means algorithm, and devises new basis functions and their corresponding representation. The experiments results showed that FUL overtakes relevant state of the art approaches. It yielded the highest F1-measure with an improvement of 8%, 11%, 3%, and 4% on Parkinson, Epilepsy, Gait, and breast cancer datasets, respectively.
Please login to be able to save your searches and receive alerts for new content matching your search criteria.