Please login to be able to save your searches and receive alerts for new content matching your search criteria.
With the explosion of multimedia data, semantic event detection from videos has become a demanding and challenging topic. In addition, when the data has a skewed data distribution, interesting event detection also needs to address the data imbalance problem. The recent proliferation of deep learning has made it an essential part of many Artificial Intelligence (AI) systems. Till now, various deep learning architectures have been proposed for numerous applications such as Natural Language Processing (NLP) and image processing. Nonetheless, it is still impracticable for a single model to work well for different applications. Hence, in this paper, a new ensemble deep learning framework is proposed which can be utilized in various scenarios and datasets. The proposed framework is able to handle the over-fitting issue as well as the information losses caused by single models. Moreover, it alleviates the imbalanced data problem in real-world multimedia data. The whole framework includes a suite of deep learning feature extractors integrated with an enhanced ensemble algorithm based on the performance metrics for the imbalanced data. The Support Vector Machine (SVM) classifier is utilized as the last layer of each deep learning component and also as the weak learners in the ensemble module. The framework is evaluated on two large-scale and imbalanced video datasets (namely, disaster and TRECVID). The extensive experimental results illustrate the advantage and effectiveness of the proposed framework. It also demonstrates that the proposed framework outperforms several well-known deep learning methods, as well as the conventional features integrated with different classifiers.
In the past decades, we have witnessed an explosion of multimedia data, especially with the development of social media websites and blooming popularity of smart devices. As a result, multimedia semantic concept mining and retrieval whose objective is to mine useful information from the large amount of multimedia data including texts, images, and videos has become more and more important. The huge amount of multimedia data and the semantic gap between low-level features and high-level semantic concepts have made it even more challenging. To address these challenges, the correlations among the classes can provide important context cues to help bridge the semantic gap. Meanwhile, many real-world datasets do not have uniform class distributions while the minority instances actually represent the concept of interests, like frauds in transactions, intrusions in network security, and unusual events in surveillance. Despite extensive research efforts, imbalanced concept retrieval remains one of the most challenging research problems in multimedia data mining. Different from existing frameworks regarding concept correlations among labels, this paper presents a novel concept correlation analysis model using the correlation between the retrieval scores and labels. Experimental results on the TRECVID benchmark datasets demonstrate that the proposed framework can enhance imbalanced concept mining and retrieval even with trivial scores from the minority class.