Please login to be able to save your searches and receive alerts for new content matching your search criteria.
In this paper, a novel method termed Multi-Instance Dictionary Learning (MIDL) is presented for detecting abnormal events in crowded video scenes. With respect to multi-instance learning, each event (video clip) in videos is modeled as a bag containing several sub-events (local observations); while each sub-event is regarded as an instance. The MIDL jointly learns a dictionary for sparse representations of sub-events (instances) and multi-instance classifiers for classifying events into normal or abnormal. We further adopt three different multi-instance models, yielding the Max-Pooling-based MIDL (MP-MIDL), Instance-based MIDL (Inst-MIDL) and Bag-based MIDL (Bag-MIDL), for detecting both global and local abnormalities. The MP-MIDL classifies observed events by using bag features extracted via max-pooling over sparse representations. The Inst-MIDL and Bag-MIDL classify observed events by the predicted values of corresponding instances. The proposed MIDL is evaluated and compared with the state-of-the-art methods for abnormal event detection on the UMN (for global abnormalities) and the UCSD (for local abnormalities) datasets and results show that the proposed MP-MIDL and Bag-MIDL achieve either comparable or improved detection performances. The proposed MIDL method is also compared with other multi-instance learning methods on the task and superior results are obtained by the MP-MIDL scheme.
Sparse approximation is a hypothesized coding strategy where a population of sensory neurons (e.g. V1) encodes a stimulus using as few active neurons as possible. We present the Spiking LCA (locally competitive algorithm), a rate encoded Spiking Neural Network (SNN) of integrate and fire neurons that calculate sparse approximations. The Spiking LCA is designed to be equivalent to the nonspiking LCA, an analog dynamical system that converges on a ℓ1-norm sparse approximations exponentially. We show that the firing rate of the Spiking LCA converges on the same solution as the analog LCA, with an error inversely proportional to the sampling time. We simulate in NEURON a network of 128 neuron pairs that encode 8 × 8 pixel image patches, demonstrating that the network converges to nearly optimal encodings within 20 ms of biological time. We also show that when using more biophysically realistic parameters in the neurons, the gain function encourages additional ℓ0-norm sparsity in the encoding, relative both to ideal neurons and digital solvers.
Machine learning (ML) systems are affected by a pervasive lack of transparency. The eXplainable Artificial Intelligence (XAI) research area addresses this problem and the related issue of explaining the behavior of ML systems in terms that are understandable to human beings. In many explanation of XAI approaches, the output of ML systems are explained in terms of low-level features of their inputs. However, these approaches leave a substantive explanatory burden with human users, insofar as the latter are required to map low-level properties into more salient and readily understandable parts of the input. To alleviate this cognitive burden, an alternative model-agnostic framework is proposed here. This framework is instantiated to address explanation problems in the context of ML image classification systems, without relying on pixel relevance maps and other low-level features of the input. More specifically, one obtains sets of middle-level properties of classification inputs that are perceptually salient by applying sparse dictionary learning techniques. These middle-level properties are used as building blocks for explanations of image classifications. The achieved explanations are parsimonious, for their reliance on a limited set of middle-level image properties. And they can be contrastive, because the set of middle-level image properties can be used to explain why the system advanced the proposed classification over other antagonist classifications. In view of its model-agnostic character, the proposed framework is adaptable to a variety of other ML systems and explanation problems.
The automation in the diagnosis of medical images is currently a challenging task. The use of Computer Aided Diagnosis (CAD) systems can be a powerful tool for clinicians, especially in situations when hospitals are overflowed. These tools are usually based on artificial intelligence (AI), a field that has been recently revolutionized by deep learning approaches. These alternatives usually obtain a large performance based on complex solutions, leading to a high computational cost and the need of having large databases. In this work, we propose a classification framework based on sparse coding. Images are first partitioned into different tiles, and a dictionary is built after applying PCA to these tiles. The original signals are then transformed as a linear combination of the elements of the dictionary. Then, they are reconstructed by iteratively deactivating the elements associated with each component. Classification is finally performed employing as features the subsequent reconstruction errors. Performance is evaluated in a real context where distinguishing between four different pathologies: control versus bacterial pneumonia versus viral pneumonia versus COVID-19. Our system differentiates between pneumonia patients and controls with an accuracy of 97.74%, whereas in the 4-class context the accuracy is 86.73%. The excellent results and the pioneering use of sparse coding in this scenario evidence that our proposal can assist clinicians when their workload is high.
Sparse representation based classification has led to interesting image recognition results, while the dictionary used for sparse coding plays a key role in it. This paper presents a novel supervised structure dictionary learning (SSDL) algorithm to learn a discriminative and block structure dictionary. We associate label information with each dictionary item and make each class-specific sub-dictionary in the whole structured dictionary have good representation ability to the training samples from the associated class. More specifically, we learn a structured dictionary and a multiclass classifier simultaneously. Adding an inhomogeneous representation term to the objective function and considering the independence of the class-specific sub-dictionaries improve the discrimination capabilities of the sparse coordinates. An iteratively optimization method be proposed to solving the new formulation. Experimental results on four face databases demonstrate that our algorithm outperforms recently proposed competing sparse coding methods.
In computer vision, Local Binary Pattern (LBP) and Scale Invariant Feature Transform (SIFT) are two widely used local descriptors. In this paper, we propose to combine them effectively for scene categorization. First, LBP and SIFT features are regularly extracted from training images for constructing a LBP feature codebook and a SIFT feature codebook. Then, a two-dimensional table is created by combining the obtained codebooks. For creating a representation for an image, LBP and SIFT features extracted from the same positions of the image are encoded together based on sparse coding by using the two-dimensional table. After processing all features in the input image, we adopt spatial max pooling to determine its representation. Obtained image representations are forwarded to a Support Vector Machine classifier for categorization. In addition, in order to improve the scene categorization performance further, we propose a method to select correlated visual words from large codebooks for constructing the two-dimensional table. Finally, for evaluating the proposed method, extensive experiments are implemented on datasets Scene Categories 8, Scene Categories 15 and MIT 67 Indoor Scene. It is demonstrated that the proposed method is effective for scene categorization.
As we know, the nearest neighbor search is a good and effective method for good-sized image search. This paper mainly introduced how to learn an outstanding image feature representation form and a series of compact binary Hash coding functions under deep learning framework. Our concept is that binary codes can be obtained using a hidden layer to present some latent concepts dominating the class labels with usable data labels. Our method is effective in obtaining hash codes and image representations, so it is suitable for good-sized dataset. It is demonstrated in our experiment that the performances of the proposed algorithms were then verified on three different databases, MNIST, CIFAR-10 and Caltech-101. The experimental results reveal that two-proposed image Hash retrieval algorithm based on pixel-level automatic feature learning show higher search accuracy than the other algorithms; moreover, these two algorithms were proved to be more favorable in scalability and generality.
Modeling real dynamical systems is an important challenge in many areas of science. Extracting governing equations of systems from their time-series is a possible solution for such a challenge. In this paper, we use the sparse recovery and dictionary learning to extract governing equations of a system with parametric basis functions. In this algorithm, the assumption of sparsity in the functions of dynamical equations is used. The proposed algorithm is applied to different types of discrete and continuous nonlinear dynamical systems to show the generalization ability of this method. On the other hand, transition from one dynamical regime to another is an important concept in studying real world complex systems like biological and climate systems. Lyapunov exponent is an early warning index. It can predict bifurcation points in dynamical systems. Computation of Lyapunov exponent is a major challenge in its application in real systems, since it needs long time data to be accurate. In this paper, we use the predicted governing equation to generate long time-series, which is needed for Lyapunov exponent calculation. So the proposed method can help us to predict bifurcation points by accurate calculation of Lyapunov exponents.
Software defect prediction technology has been widely used in improving the quality of software system. Most real software defect datasets tend to have fewer defective modules than defective-free modules. Highly class-imbalanced data typically make accurate predictions difficult. The imbalanced nature of software defect datasets makes the prediction model classifying a defective module as a defective-free one easily. As there exists the similarity during the different software modules, one module can be represented by the sparse representation coefficients over the pre-defined dictionary which consists of historical software defect datasets. In this study, we make use of dictionary learning method to predict software defect. We optimize the classifier parameters and the dictionary atoms iteratively, to ensure that the extracted features (sparse representation) are optimal for the trained classifier. We prove the optimal condition of the elastic net which is used to solve the sparse coding coefficients and the regularity of the elastic net solution. Due to the reason that the misclassification of defective modules generally incurs much higher cost risk than the misclassification of defective-free ones, we take the different misclassification costs into account, increasing the punishment on misclassification defective modules in the procedure of dictionary learning, making the classification inclining to classify a module as a defective one. Thus, we propose a cost-sensitive software defect prediction method using dictionary learning (CSDL). Experimental results on the 10 class-imbalance datasets of NASA show that our method is more effective than several typical state-of-the-art defect prediction methods.
Clustering techniques offer a systematic approach to organize the diverse and fast increasing Web services by assigning relevant services into homogeneous service communities. However, the ever increasing number of Web services poses key challenges for building large-scale service communities. In this paper, we tackle the scalability issue in service clustering, aiming to accurately and efficiently discover service communities over very large-scale services. A key observation is that service descriptions are usually represented by long but very sparse term vectors as each service is only described by a limited number of terms. This inspires us to seek a new service representation that is economical to store, efficient to process, and intuitive to interpret. This new representation enables service clustering to scale to massive number of services. More specifically, a set of anchor services are identified that allows each service to represent as a linear combination of a small number of anchor services. In this way, the large number of services are encoded with a much more compact anchor service space. Despite service clustering can be performed much more efficiently in the compact anchor service space, discovery of anchor services from large-scale service descriptions may incur high computational cost. We develop principled optimization strategies for efficient anchor service discovery. Extensive experiments are conducted on real-world service data to assess both the effectiveness and efficiency of the proposed approach. Results on a dataset with over 3,700 Web services clearly demonstrate the good scalability of sparse functional representation and the efficiency of the optimization algorithms for anchor service discovery.
Feature extraction is very important in the subject of pattern recognition. Sparse coding is an approach for extracting the independent features of an image. The image features extracted by sparse coding have led to better recognition performance as compared to those from traditional PCA-based methods. A new discriminatory sparse coding (DSC) algorithm is proposed in this paper to further improve the classification performance. Based on reinforcement learning, DSC encodes the training samples by individual class rather than by individual image as in sparse coding. Having done that it will produce a set of features with large and small intraclass variations, which is very suitable for recognition tasks. Experiments are performed on face image feature extraction and recognition. Compared with the traditional PCA- and ICA-based methods, DSC shows a much better recognition performance.
Olshausen and Field (1996) developed a simple cell receptive field model for natural scene processing in V1, based on unsupervised learning and non-orthogonal basis function optimization of an overcomplete representation of visual space. The model was originally tested with an ensemble of whitened natural scenes, simulating pre-cortical filtering in the retinal ganglia and lateral geniculate nucleus, and the basis functions qualitatively resembled the orientation-specific responses of V1 simple cells in the spatial domain. In this study, the quantitative tuning responses of the basis functions in the spectral domain are estimated using a Gaussian model, to determine their goodness-of-fit to the known bandwidths of simple cells in primate V1. Five simulation experiments which examined key features of the model are reported: changing the size of the basis functions; using a complete versus over-complete representation; changing the sparseness factor; using a variable learning rate; and mapping the basis functions with a whitening spatial function. The key finding of this study is that across all image themes, basis function sizes, number of basis functions, sparseness factors and learning rates, the spatial-frequency tuning did not closely resemble that of primate area 17 — the model results more closely resembled the unclassified cat neurones of area 19 with a single exception, and not area 17 as predicted.
A new approach for feature extraction using neural response has been developed in this paper through combining the hierarchical architectures with the sparse coding technique. As far as proposed layered model, at each layer of hierarchy, it concerned two components that were used are sparse coding and pooling operation. While the sparse coding was used to solve increasingly complex sparse feature representations, the pooling operation by comparing sparse outputs was used to measure the match between a stored prototype and the input sub-image. It is recommended that value of the best matching should be kept and discarding the others. The proposed model is implemented and tested taking into account two ranges of recognition tasks i.e. image recognition and speech recognition (on isolated word vocabulary). Experimental results with various parameters demonstrate that proposed scheme leads to extract more efficient features than other methods.
Traditional satellite recognition usually applies low-level features (e.g., invariant moment) to describe the global information of satellites. Therefore, the local property and the latent high-level semantic concepts of satellites are likely to be ignored. Recently, the topic model has been given more attention in object recognition field, which generally supposes that the local feature can be modeled as words and images are regarded as documents. Based on this assumption, it aims to discover the latent topics existed between words and documents and then utilizes the topics to represent the images. However, topic model often uses bag-of-words (BoW) strategy that each local feature descriptor is represented by only one word. In this paper, we propose a novel method called sparse coding (SC) based probabilistic latent semantic analysis (SC-pLSA) for satellite recognition. Compared with conventional topic models, our method performs sparse coding to explore the potential correlation between the descriptor and multiple words. Consequently, the descriptor can be described by a small set of words. Based on this word-level representation, probabilistic latent semantic analysis (pLSA) model with simplicity and low computational cost is employed for learning the latent topics. Experimental results on the BUAA-SID 1.0 multi-view satellite dataset validate the effectiveness of our proposed method, and especially under the interference of noise, it outperforms the traditional recognition methods: Hu invariant moment, BoW, SC, locality-constrained linear coding (LLC), conventional pLSA and latent Dirichlet allocation (LDA) with three different classifiers: nearest neighbor (NN) classifier, linear SVM classifier, and sparse representation classifier (SRC).
Crowd feature perception is an essential step for us to understand the crowd behavior. However, as the individuals present not only the sociality but also the randomness, there remain great challenges to extract the sociality of the individual directly. In this paper, we propose a crowd feature perception algorithm based on a sparse linear model (SLM). It builds the statistical characterization of the sociality by assuming a priori distribution of the SLM. First, we calculate the optical flow to extract the motion information of the crowd. Second, we input the video motion features to the sparse coding and generate the SLM. The super-Gaussian prior distributions in SLMs build the statistical characterization of the sociality. In addition, we combine the infinite Hidden Markov Model (iHMM) statistic model to determine whether the detected event is an abnormal event. We validate our method on UMN dataset and simulate dataset for abnormal detection, and the experiments show that this algorithm generates promising result compared with other state-of-art methods.
Neuronal ensemble activity codes working memory. In this work, we developed a neuronal ensemble sparse coding method, which can effectively reduce the dimension of the neuronal activity and express neural coding. Multichannel spike trains were recorded in rat prefrontal cortex during a work memory task in Y-maze. As discrete signals, spikes were transferred into continuous signals by estimating entropy. Then the normalized continuous signals were decomposed via non-negative sparse method. The non-negative components were extracted to reconstruct a low-dimensional ensemble, while none of the feature components were missed. The results showed that, for well-trained rats, neuronal ensemble activities in the prefrontal cortex changed dynamically during the working memory task. And the neuronal ensemble is more explicit via using non-negative sparse coding. Our results indicate that the neuronal ensemble sparse coding method can effectively reduce the dimension of neuronal activity and it is a useful tool to express neural coding.
Image restoration (IR) attempts to recreate the original (ideal) scene from a degraded observation. The goal of image restoration is to avert the process of image deterioration that happens during image acquisition and processing. Blurring and noise are two common types of picture capture degradation. When the blurring function is unknown, the image restoration issue is referred to as blind image restoration. Existing methods of image restoration do not provide efficient results due to time complexity, computational complexity, large-scale scaling factor and limited input. Therefore, in order to overcome those problems, an Improved Graph Laplacian Regularization with Sparse Coding by integrating Internet of Things (IoT) is developed in this research. The process of removing de-noised portions of image to texture layer and cartoon layer is carried out by Morphological Component Analysis (MCA). An improved Graph Laplacian regularized method and Simultaneous Sparse Coding with Gaussian Scale Mixture (SSC-GSM) methods is performed on texture layer, cartoon layer and restored image. A Levin’s dataset and a real-time dataset such as industrial manufacturing products are analyzed in this research. The proposed Graph Laplacian algorithm and sparse coding model are better efficient in identifying sharper texture information and getting a restored image. The proposed method proves that it has better performance in terms of Peak Signal-to-Noise Ratio (PSNR) of 35.82 and Structural Similarity Index Measure (SSIM) of 0.94 than the existing methods.
Human action recognition (HAR) is a challenging problem because of the complexity and similarity in different actions. In recent years, many methods have been proposed for HAR. Sparse coding-based approaches have been widely used in this field. Also, many works have been done based on manifold learning theory. When the videos are similar but from different classes, their sparse codes may be similar and the actions might be classified mistakenly. In this paper, a multi-modal affine graph regularized sparse coding approach is proposed for solving this problem in HAR. At first, HOG3D, HOG/Hof and SURF3D descriptors were extracted from the action datasets, then the sparse codes have been obtained for each descriptor using the proposed method. The dictionary learning method used in this step has more discrimination power in respect to the traditional methods. Then, these codes are scored differently using SVM classifier and at last a Naïve Bayes leads to a final decision. Experiments on KTH, Weizmann and UCF Sport action datasets show that the proposed method can significantly outperform several previous methods in human action classification especially in real-world data.
Recent studies have revealed that melancholic depression, one major subtype of depression, is closely associated with the concentration of some metabolites and biological functions of certain genes and pathways. Meanwhile, recent advances in biotechnologies have allowed us to collect a large amount of genomic data, e.g., metabolites and microarray gene expression. With such a huge amount of information available, one approach that can give us new insights into the understanding of the fundamental biology underlying melancholic depression is to build disease status prediction models using classification or regression methods. However, the existence of strong empirical correlations, e.g., those exhibited by genes sharing the same biological pathway in microarray profiles, tremendously limits the performance of these methods. Furthermore, the occurrence of missing values which are ubiquitous in biomedical applications further complicates the problem. In this paper, we hypothesize that the problem of missing values might in some way benefit from the correlation between the variables and propose a method to learn a compressed set of representative features through an adapted version of sparse coding which is capable of identifying correlated variables and addressing the issue of missing values simultaneously. An efficient algorithm is also developed to solve the proposed formulation. We apply the proposed method on metabolic and microarray profiles collected from a group of subjects consisting of both patients with melancholic depression and healthy controls. Results show that the proposed method can not only produce meaningful clusters of variables but also generate a set of representative features that achieve superior classification performance over those generated by traditional clustering and data imputation techniques. In particular, on both datasets, we found that in comparison with the competing algorithms, the representative features learned by the proposed method give rise to significantly improved sensitivity scores, suggesting that the learned features allow prediction with high accuracy of disease status in those who are diagnosed with melancholic depression. To our best knowledge, this is the first work that applies sparse coding to deal with high feature correlations and missing values, which are common challenges in many biomedical applications. The proposed method can be readily adapted to other biomedical applications involving incomplete and high-dimensional data.