Please login to be able to save your searches and receive alerts for new content matching your search criteria.
Understanding contents in social networks by inferring high-quality latent topics from short texts is a significant task in social analysis, which is challenging because social network contents are usually extremely short, noisy and full of informal vocabularies. Due to the lack of sufficient word co-occurrence instances, well-known topic modeling methods such as LDA and LSA cannot uncover high-quality topic structures. Existing research works seek to pool short texts from social networks into pseudo documents or utilize the explicit relations among these short texts such as hashtags in tweets to make classic topic modeling methods work. In this paper, we explore this problem by proposing a topic model for noisy short texts with multiple relations called MRTM (Multiple Relational Topic Modeling). MRTM exploits both explicit and implicit relations by introducing a document-attribute distribution and a two-step random sampling strategy. Extensive experiments, compared with the state-of-the-art topic modeling approaches, demonstrate that MRTM can alleviate the word co-occurrence sparsity and uncover high-quality latent topics from noisy short texts.
Microblogging platforms like Twitter, in the recent years, have become one of the important sources of information for a wide spectrum of users. As a result, these platforms have become great resources to provide support for emergency management. During any crisis, it is necessary to sieve through a huge amount of social media texts within a short span of time to extract meaningful information from them. Extraction of emergency-specific information, such as topic keywords or landmarks or geo-locations of sites, from these texts plays a significant role in building an application for emergency management. This paper thus highlights different aspects of automatic analysis of tweets to help in developing such an application. Hence, it focuses on: (1) identification of crisis-related tweets using machine learning, (2) exploration of topic model implementations and looking at its effectiveness on short messages (as short as 140 characters); and performing an exploratory data analysis on short texts related to crises collected from Twitter, and looking at different visualizations to understand the commonality and differences between topics and different crisis-related data, and (3) providing a proof of concept for identifying and retrieving different geo-locations from tweets and extracting the GPS coordinates from this data to approximately plot them in a map.