Please login to be able to save your searches and receive alerts for new content matching your search criteria.
LDA (Latent Dirichlet Allocation) proposed by Blei is a generative probabilistic model of a corpus, where documents are represented as random mixtures over latent topics, and each topic is characterized by a distribution over words, but not the attributes of word positions of every document in the corpus. In this paper, a Word Position-Related LDA Model is proposed taking into account the attributes of word positions of every document in the corpus, where each word is characterized by a distribution over word positions. At the same time, the precision of the topic-word's interpretability is improved by integrating the distribution of the word-position and the appropriate word degree, taking into account the different word degree in the different word positions. Finally, a new method, a size-aware word intrusion method is proposed to improve the ability of the topic-word's interpretability. Experimental results on the NIPS corpus show that the Word Position-Related LDA Model can improve the precision of the topic-word's interpretability. And the average improvement of the precision in the topic-word's interpretability is about 9.67%. Also, the size-aware word intrusion method can interpret the topic-word's semantic information more comprehensively and more effectively through comparing the different experimental data.
In order to improve the segmentation accuracy of plant lesion images, multi-channels segmentation algorithm of plant disease image was proposed based on linear discriminant analysis (LDA) method’s mapping and K-means’ clustering. Firstly, six color channels from RGB model and HSV model were obtained, and six channels of all pixels were laid out to six columns. Then one of these channels was regarded as label and the others were regarded as sample features. These data were grouped for linear discrimination analysis, and the mapping values of the other five channels were applied to the eigen vector space according to the first three big eigen values. Secondly, the mapping value was used as the input data for K-means and the points with minimum and maximum pixel values were used as the initial cluster center, which overcame the randomness for selecting the initial cluster center in K-means. And the segmented pixels were changed into background and foreground, so that the proposed segmentation method became the clustering of two classes for background and foreground. Finally, the experimental result showed that the segmentation effect of the proposed LDA mapping-based method is better than those of K-means, ExR and CIVE methods.
In Finger-Knuckle-Print (FKP) recognition, feature extraction plays a very important role in the overall system performance. This paper merges two types of the histograms of oriented gradients (HOG)-based features extracted from reflectance and illumination images for FKP-based identification. The Adaptive Single Scale Retinex (ASSR) algorithm has been used to extract the illumination and the reflectance images from each FKP image. Serial feature fusion is used to form a large feature vector for each user, and extract the distinctive features in the higher-dimension vector space. Finally, the cosine similarity distance measure is used for classification. The Hong Kong Polytechnic University (PolyU) FKP database is used during all of the tests. Experimental results show that our proposed system achieves better results than other state-of-the-art system.
With the rapid development of Internet technology, the network has become an indispensable way of life for undergraduates. The correct guidance of public opinion has also become an important thing in the ideological work of universities. Undergraduates are in an important period of formation and development of thoughts that they are easily to be incited by cyber-rumors. Therefore, it is particularly important to obtain the data of political public opinion in universities and position the hot topics for early detection of political public opinion tendency, which can also avoid the outbreak of major security incidents. With such consideration, this paper obtains multi-source political public opinion data from BBS, Tieba and Weibo of SUN YAT-SEN UNIVERSITY (SYSU) through crawler. We study a text feature extraction method based on Word2Vec & LDA (Latent Dirichlet Allocation), which improves the high-dimensional sparsity in traditional Vector Space Model (VSM) text representation. Meanwhile, based on the classical Single-pass clustering algorithm, this paper studies the Single-pass & HAC clustering algorithm. In addition, a measurement method of hot topic is defined to calculate the heat value of political public opinion. Dictionary and rule based method is used to improve the accuracy of sentiment tendency analysis. The experimental results demonstrate that the effect of topic detection and positioning based on LDA & Word2Vec and Single-pass & HAC algorithm is better than other methods.
Following the Basel II Accord, with the increased focus on operational risk as an aspect distinct from credit and market risk, quantification of operational risk has been a major challenge for banks. This paper analyzes implications of the advanced measurement approach to estimate the operational risk. When modeling the severity of losses in a realistic manner, our preliminary tests indicate that classic distributions are unable to fit the entire range of operational risk data samples (collected from public information sources) well. Then, we propose a piecewise-defined severity distribution (PSD) that combines a parameter form for ordinary losses and a generalized Pareto distribution (GPD) for large losses, and estimate operational risk by the loss distribution approach (LDA) with Monte Carlo simulation. We compare the operational risk measured with piecewise-defined severity distribution based LDA (PSD-LDA) with those obtained from the basic indicator approach (BIA), and the ratios of operational risk regulatory capital of some major international banks with those of Chinese commercial banks. The empirical results reveal the rationality and promise of application of the PSD-LDA for Chinese national commercial banks.
As social media platforms have gained huge momentum in recent years, the amount of information generated from the social media sites is growing exponentially and gives the information retrieval systems a great challenge to extract the potential named entities. Researchers have utilized the semantic annotation mechanism to retrieve the entities from the unstructured documents, but the mechanism returns with too many ambiguous entities. In this work, the DBpedia knowledge base is adopted for entity extraction and categorization. To achieve the entity extraction task precisely, a two-step process is proposed: (a) train the unstructured datasets with Word2Vec and classify the entities into their respective categories. (b) crawl the web pages, forums, and other web sources to identifying the entities that are not present in the DBpedia. The evaluation shows the results with more precision and promising F1 score.
An analysis of the Electroencephalogram (EEG) signals while performing a monotonous task and drinking alcohol using principal component analysis (PCA), linear discriminant analysis (LDA) for feature extraction and Neural Networks (NNs) for classification is proposed. The EEG is captured while performing a monotonous task that can adversely affect the brain and possibly cause stress. Moreover, we investigate the effects of alcohol on the brain by capturing the data continuously after consumption of equal amounts of alcohol. We hope that our work will shed more light on the relationship between such actions and EEG, and investigate if there is any relation between the tasks and mental stress. EEG signals offers a rare look at brain activity, while, monotonous activities are well known to cause irritation which may contribute to mental stress. We apply PCA and LDA to characterize the change in each component, extract it and discriminate using a NN. After experiments, it was found that PCA and LDA are effective analysis methods in EEG signal analysis.
Microblogging platforms like Twitter, in the recent years, have become one of the important sources of information for a wide spectrum of users. As a result, these platforms have become great resources to provide support for emergency management. During any crisis, it is necessary to sieve through a huge amount of social media texts within a short span of time to extract meaningful information from them. Extraction of emergency-specific information, such as topic keywords or landmarks or geo-locations of sites, from these texts plays a significant role in building an application for emergency management. This paper thus highlights different aspects of automatic analysis of tweets to help in developing such an application. Hence, it focuses on: (1) identification of crisis-related tweets using machine learning, (2) exploration of topic model implementations and looking at its effectiveness on short messages (as short as 140 characters); and performing an exploratory data analysis on short texts related to crises collected from Twitter, and looking at different visualizations to understand the commonality and differences between topics and different crisis-related data, and (3) providing a proof of concept for identifying and retrieving different geo-locations from tweets and extracting the GPS coordinates from this data to approximately plot them in a map.
Negative online reviews have become essential decision-making information for businesses. By conducting text mining on negative online reviews of e-commerce platforms to accurately identify problems in online platform transactions, using social network analysis to clarify the correlation between critical factors in negative reviews, and applying the LDA topic model to mine eight significant themes of negative reviews, namely platform rider disputes, education refund difficulties, difficulty in canceling or changing reservations, damage or loss of goods, taxi disputes, payment harassment complaints, platform member disputes, and slow response on customer service. This chapter is of great significance for improving the quality of products and services, enhancing customer satisfaction, and effectively regulating e-commerce platforms by the government.
This paper research a method that can confirm the software evolution based on Latent Dirichlet Allocation (LDA). LDA is a method that can analyze the interdependency among words, topics and documents, and the interdependency can be expressed as probability. In this paper, adoption of LDA to modeling software evolution, take the package in source code as a document, regard names of function (method), variable names and comments as words, and figure out the probability between the three. Take results compare with update reports, can confirm the software of new version consistent with update reports.