Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Automated sentiment analysis is becoming increasingly recognized due to the growing importance of social media and e-commerce platform review websites. Deep neural networks outperform traditional lexicon-based and machine learning methods by effectively exploiting contextual word embeddings to generate dense document representation. However, this representation model is not fully adequate to capture topical semantics and the sentiment polarity of words. To overcome these problems, a novel sentiment analysis model is proposed that utilizes richer document representations of word-emotion associations and topic models, which is the main computational novelty of this study. The sentiment analysis model integrates word embeddings with lexicon-based sentiment and emotion indicators, including negations and emoticons, and to further improve its performance, a topic modeling component is utilized together with a bag-of-words model based on a supervised term weighting scheme. The effectiveness of the proposed model is evaluated using large datasets of Amazon product reviews and hotel reviews. Experimental results prove that the proposed document representation is valid for the sentiment analysis of product and hotel reviews, irrespective of their class imbalance. The results also show that the proposed model improves on existing machine learning methods.
When encountering bug issues, developers tend to search the bug repository and commit repository for references. However, the links between bug reports and commits in version control systems are often missed, and the information in bug repository and commit repository can provide is simple. When developers search a bug issue, they can only get the information of bug reports or commits, which are loose and difficult for developers to refer. What’s more, many searching results are not accurate. To deal with these problems, this paper proposes an approach to deal with the bug and commit information with the topic model, and construct bug knowledge graph as a service to assist in bug search. In addition, as the amount of bug related information continuously increase, it is time-consuming to update the data. We can automatically update the bug knowledge graph with the LTM topic model (a lifelong topic model). Finally, the experiment with the bug reports from Bugzilla@Mozilla and the corresponding commits from Github was conducted. The experiment results show that our approach is effective and efficient to help developers search relevant bugs for reference by constructing the bug knowledge as a service.
This research proposes a framework to improve the efficiency of classification and matching of descriptions of skill on resumes with jobs vacancies using labor market intelligence over a dataset of resumes harvested from social networks. To carry out the experiments, a Kaggle dataset was downloaded containing information from the LinkedIn social network with more than 200,000 records that were later filtered and pre-processed to generate a topic model to classify the entire dataset. Later, using machine learning algorithms, prediction exercises were performed to determine the most efficient match. This model offers high percentages of efficiency when predicting the job position of a candidate of information technology (IT) areas This prediction is achieved due the reduction of categories in these areas generated by the creation of the corresponding topic model to match the resume with the job position.
Traditional satellite recognition usually applies low-level features (e.g., invariant moment) to describe the global information of satellites. Therefore, the local property and the latent high-level semantic concepts of satellites are likely to be ignored. Recently, the topic model has been given more attention in object recognition field, which generally supposes that the local feature can be modeled as words and images are regarded as documents. Based on this assumption, it aims to discover the latent topics existed between words and documents and then utilizes the topics to represent the images. However, topic model often uses bag-of-words (BoW) strategy that each local feature descriptor is represented by only one word. In this paper, we propose a novel method called sparse coding (SC) based probabilistic latent semantic analysis (SC-pLSA) for satellite recognition. Compared with conventional topic models, our method performs sparse coding to explore the potential correlation between the descriptor and multiple words. Consequently, the descriptor can be described by a small set of words. Based on this word-level representation, probabilistic latent semantic analysis (pLSA) model with simplicity and low computational cost is employed for learning the latent topics. Experimental results on the BUAA-SID 1.0 multi-view satellite dataset validate the effectiveness of our proposed method, and especially under the interference of noise, it outperforms the traditional recognition methods: Hu invariant moment, BoW, SC, locality-constrained linear coding (LLC), conventional pLSA and latent Dirichlet allocation (LDA) with three different classifiers: nearest neighbor (NN) classifier, linear SVM classifier, and sparse representation classifier (SRC).
There have been an enormous number of publications on cancer research. These unstructured cancer-related articles are of great value for cancer diagnostics, treatment, and prevention. The aim of this study is to introduce a recommendation system. It combines text mining (LDA) and semantic computing (GloVe) to understand the meaning of user needs and to increase the recommendation accuracy.
There have been an enormous number of publications on cancer research. These unstructured cancer-related articles are of great value for cancer diagnostics, treatment, and prevention. The aim of this study is to introduce a recommendation system. It combines text mining (LDA) and semantic computing (GloVe) to understand the meaning of user needs and to increase the recommendation accuracy.
Similarity measurement plays an important role in the classification of short text. However, traditional text similarity measures fail to achieve a high accuracy because the sparse features in short text. In this paper, we propose a new method based on the different number of hidden topics, which are derived through well-known topic models such as Latent Dirichlet Allocation (LDA). We obtain the related topics, and integrate the topics with the features of short text in order to decrease the sparseness and improve the word co-occurrences. Numerous experiments were conducted on the open data set (Wikipedia dataset) and the results demonstrated that our proposed method improves classification accuracy by 14.03% on the k-nearest neighbors algorithm (KNN). This indicates that our method outperforms other state-of-the-art methods which do not utilize hidden topics and validates that the method is effective.