Special Issue — Best Papers from 2016 IEEE International Symposium on Multimedia (ISM 2016) — Part 1; Guest Editors: G. Zhang and Phillip C.-Y. SheuNo Access

Automatic Video Event Detection for Imbalance Data Using Enhanced Ensemble Deep Learning

Samira Pouyanfar

School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA

E-mail Address: spouy001@cs.fiu.edu

Search for more papers by this author

and

Shu-Ching Chen

School of Computing and Information Sciences, Florida International University, Miami, FL 33199, USA

E-mail Address: chens@cs.fiu.edu

Search for more papers by this author

https://doi.org/10.1142/S1793351X17400050Cited by:41 (Source: Crossref)

Abstract

With the explosion of multimedia data, semantic event detection from videos has become a demanding and challenging topic. In addition, when the data has a skewed data distribution, interesting event detection also needs to address the data imbalance problem. The recent proliferation of deep learning has made it an essential part of many Artificial Intelligence (AI) systems. Till now, various deep learning architectures have been proposed for numerous applications such as Natural Language Processing (NLP) and image processing. Nonetheless, it is still impracticable for a single model to work well for different applications. Hence, in this paper, a new ensemble deep learning framework is proposed which can be utilized in various scenarios and datasets. The proposed framework is able to handle the over-fitting issue as well as the information losses caused by single models. Moreover, it alleviates the imbalanced data problem in real-world multimedia data. The whole framework includes a suite of deep learning feature extractors integrated with an enhanced ensemble algorithm based on the performance metrics for the imbalanced data. The Support Vector Machine (SVM) classifier is utilized as the last layer of each deep learning component and also as the weak learners in the ensemble module. The framework is evaluated on two large-scale and imbalanced video datasets (namely, disaster and TRECVID). The extensive experimental results illustrate the advantage and effectiveness of the proposed framework. It also demonstrates that the proposed framework outperforms several well-known deep learning methods, as well as the conventional features integrated with different classifiers.

Keywords:

References

1. J. Bouvrie, Notes on convolutional neural networks, Technical report, Massachusetts Institute of Technology (2006). Google Scholar
2. Brewing ImageNet (2016), http://caffe.berkeleyvision.org/gathered/examples/imagenet.html. Google Scholar
3. S. A. Chatzichristofis and Y. S. Boutalis, CEDD: Color and edge directivity descriptor: A compact descriptor for image indexing and retrieval, in International Conference on Computer Vision Systems, 2008, pp. 312–322. Google Scholar
4. C. Chen, Q. Zhu, L. Lin and M.-L. Shyu, Web media semantic concept retrieval via tag removal and model fusion, ACM Transactions on Intelligent Systems and Technology 4 (4) (2013) 61. Crossref, Web of Science, Google Scholar
5. M. Chen, C. Zhang and S.-C. Chen, Semantic event extraction using neural network ensembles, in International Conference on Semantic Computing, pp. 575–580. Google Scholar
6. S.-C. Chen, R. L. Kashyap and A. Ghafoor, Semantic Models for Multimedia Database Searching and Browsing (Springer Science & Business Media, 2000). Google Scholar
7. S.-C. Chen, M.-L. Shyu and R. Kashyap, Augmented transition network as a semantic model for video data, International Journal of Networking and Information Systems: Special Issue on Video Data 3 (1) (2000) 9–25. Google Scholar
8. S.-C. Chen, M.-L. Shyu, S. Peeta and C. Zhang, Learning-based spatio-temporal vehicle tracking and indexing for transportation multimedia database systems, IEEE Transactions on Intelligent Transportation Systems 4 (3) (2003) 154–167. Crossref, Web of Science, Google Scholar
9. S.-C. Chen, M.-L. Shyu and C. Zhang, An intelligent framework for spatio-temporal vehicle tracking, in Proceedings of the 4th International IEEE Conference on Intelligent Transportation Systems, 2001 pp. 213–218. Google Scholar
10. S.-C. Chen, M.-L. Shyu and C. Zhang, Innovative shot boundary detection for video indexing, Video Data Management and Information Retrieval, 2005, pp. 217–236. Crossref, Google Scholar
11. S.-C. Chen, M.-L. Shyu, C. Zhang and M. Chen, A multimodal data mining framework for soccer goal detection based on decision tree logic, International Journal of Computer Applications in Technology 27 (4) (2006) 312–323. Crossref, Google Scholar
12. S.-C. Chen, M.-L. Shyu, C. Zhang and R. L. Kashyap, Identifying overlapped objects for video indexing and modeling in multimedia database systems, International Journal on Artificial Intelligence Tools 10 (04) (2001) 715–734. Link, Google Scholar
13. S.-C. Chen, S. Sista, M.-L. Shyu and R. L. Kashyap, Augmented transition networks as video browsing models for multimedia databases and multimedia information systems, in 11th IEEE International Conference on Tools with Artificial Intelligence, 1999, pp. 175–182. Google Scholar
14. X. Chen, C. Zhang, S.-C. Chen and M. Chen, A latent semantic indexing based method for solving multiple instance learning problem in region-based image retrieval, in Seventh IEEE International Symposium on Multimedia, 2005, p. 8. Google Scholar
15. X. Chen, C. Zhang, S.-C. Chen and S. Rubin, A human-centered multiple instance learning framework for semantic video retrieval, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 39 (2) (2009) 228–233. Crossref, Google Scholar
16. N. Dalal and B. Triggs, Histograms of oriented gradients for human detection, in IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005, pp. 886–893. Google Scholar
17. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li and L. Fei-Fei, ImageNet: A large-scale hierarchical image database, in IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. Google Scholar
18. Z. Ge, C. McCool, C. Sanderson and P. Corke, Content specific feature learning for fine-grained plant classification, in Working Notes of CLEF Conference, 2015. Google Scholar
19. R. Girshick, J. Donahue, T. Darrell and J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587. Google Scholar
20. H.-Y. Ha, Y. Yang, F. C. Fleites and S.-C. Chen, Correlation-based feature analysis and multi-modality fusion framework for multimedia semantic retrieval, in IEEE International Conference on Multimedia and Expo, 2013, pp. 1–6. Google Scholar
21. H.-Y. Ha, Y. Yang, S. Pouyanfar, H. Tian and S.-C. Chen, Correlation-based deep learning for multimedia semantic concept detection, in International Conference on Web Information Systems Engineering, Springer, 2015, pp. 473–487. Google Scholar
22. H. He and E. A. Garcia, Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering 21 (9), (2009) 1263–1284. Crossref, Web of Science, Google Scholar
23. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778. Google Scholar
24. X. Huang, S.-C. Chen, M.-L. Shyu and C. Zhang, User concept pattern discovery using relevance feedback and multiple instance learning for content-based image retrieval, in Proceedings of the Third International Conference on Multimedia Data Mining (Springer-Verlag, 2002), pp. 100–108. Google Scholar
25. D. H. Hubel and T. N. Wiesel, Receptive fields and functional architecture in two nonstriate visual areas (18 and 19) of the cat, Journal of Neurophysiology 28 (2) (1965) 229–289. Crossref, Web of Science, Google Scholar
26. N. Inoue, Y. Kamishima, T. Wada, K. Shinoda and S. Sato, TokyoTech+Canon at TRECVID 2011, in Proc. TRECVID Workshop 2011. Google Scholar
27. N. Inoue and K. Shinoda, A fast and accurate video semantic-indexing system using fast MAP adaptation and GMM supervectors, IEEE Transactions on Multimedia 14 (4) (2012) 1196–1205. Crossref, Web of Science, Google Scholar
28. C. Ittichaichareon, S. Suksri and T. Yingthawornsuk, Speech recognition using MFCC, in International Conference on Computer Graphics, Simulation and Modeling, 2012, pp. 28–29. Google Scholar
29. Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darrell, Caffe: Convolutional architecture for fast feature embedding, in Proceedings of the 22nd ACM International Conference on Multimedia, ACM, 2014, pp. 675–678. Google Scholar
30. B. Krawczyk, Learning from imbalanced data: Open challenges and future directions, Progress in Artificial Intelligence 5, (2016) 1–12. Crossref, Google Scholar
31. A. Krizhevsky, I. Sutskever and G. E. Hinton, ImageNet classification with deep convolutional neural networks, in Advances in Neural Information Processing Systems, 2012, pp. 1097–1105. Google Scholar
32. Y. LeCun, Y. Bengio and G. Hinton, Deep learning, Nature 521 (7553), (2015) 436–444. Crossref, Web of Science, Google Scholar
33. Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, Gradient-based learning applied to document recognition, in Proceedings of the IEEE 86 (11), (1998) 2278–2324. Google Scholar
34. X. Li, S.-C. Chen, M.-L. Shyu and B. Furht, Image retrieval by color, texture, and spatial information, in Proceedings of the 8th International Conference on Distributed Multimedia Systems, 2002, pp. 152–159. Google Scholar
35. L. Lin, G. Ravitz, M.-L. Shyu and S.-C. Chen, Video semantic concept discovery using multimodal-based association classification, in IEEE International Conference on Multimedia and Expo, 2007, pp. 859–862. Google Scholar
36. L. Lin, G. Ravitz, M.-L. Shyu and S.-C. Chen, Correlation-based video semantic concept detection using multiple correspondence analysis, in IEEE International Symposium on Multimedia, 2008, pp. 316–321. Google Scholar
37. L. Lin, G. Ravitz, M.-L. Shyu and S.-C. Chen, Effective feature space reduction with imbalanced data for semantic concept detection, in IEEE International Conference on Sensor Networks, Ubiquitous and Trustworthy Computing, 2008, pp. 262–269. Google Scholar
38. L. Lin and M.-L. Shyu, Weighted association rule mining for video semantic detection, Methods and Innovations for Multimedia Database Content Management 1 (1) (2012) 37–54. Google Scholar
39. L. Lin, M.-L. Shyu, G. Ravitz and S.-C. Chen, Video semantic concept detection via associative classification, in IEEE International Conference on Multimedia and Expo, 2009, pp. 418–421. Google Scholar
40. M. Lin, Q. Chen and S. Yan, Network in Network, CoRR abs/1312.4400 (2013). Google Scholar
41. D. Liu, Y. Yan, M.-L. Shyu, G. Zhao and M. Chen, Spatio-temporal analysis for human action detection and recognition in uncontrolled environments, International Journal of Multimedia Data Engineering and Management 6 (1), (2015) 1–18. Crossref, Web of Science, Google Scholar
42. X.-Y. Liu, J. Wu and Z.-H. Zhou, Exploratory undersampling for class-imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39 (2), (2009) 539–550. Crossref, Google Scholar
43. D. G. Lowe, Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision 60 (2), (2004) 91–110. Crossref, Web of Science, Google Scholar
44. T. Meng and M.-L. Shyu, Leveraging concept association network for multimedia rare concept mining and retrieval, in IEEE International Conference on Multimedia and Expo, 2012, pp. 860–865. Google Scholar
45. S. Pouyanfar and S.-C. Chen, Semantic concept detection using weighted discretization multiple correspondence analysis for disaster information management, in 17th IEEE International Conference on Information Reuse and Integration, 2016, pp. 556–564. Google Scholar
46. S. Pouyanfar and S.-C. Chen, Semantic event detection using ensemble deep learning, in IEEE International Symposium on Multimedia, 2016, pp. 203–208. Google Scholar
47. S. Pouyanfar and H. Sameti, Music emotion recognition using two level classification, in Iranian Conference on Intelligent Systems, 2014, pp. 1–6. Google Scholar
48. M.-L. Shyu, S.-C. Chen and R. L. Kashyap, Generalized affinity-based association rule mining for multimedia database queries, Knowledge and Information Systems 3 (3), (2001) 319–337. Crossref, Google Scholar
49. M.-L. Shyu, C. Haruechaiyasak, S.-C. Chen and N. Zhao, Collaborative filtering by mining association rules from user access sequences, in International Workshop on Challenges in Web Information Retrieval and Integration, 2005, pp. 128–135. Google Scholar
50. M.-L. Shyu, K. Sarinnapakorn, I. Kuruppu-Appuhamilage, S.-C. Chen, L. Chang and T. Goldring, Handling nominal features in anomaly intrusion detection problems, in 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications, 2005, pp. 55–62. Google Scholar
51. M.-L. Shyu, Z. Xie, M. Chen and S.-C. Chen, Video semantic event/concept detection using a subspace-based multimedia data mining framework, IEEE Transactions on Multimedia 10 (2) (2008) 252–259. Crossref, Web of Science, Google Scholar
52. A. F. Smeaton, P. Over and W. Kraaij, Evaluation campaigns and trecvid, in Proceedings of the 8th ACM International Workshop on Multimedia Information Retrieval, 2006, pp. 321–330. Google Scholar
53. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, Going deeper with convolutions, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1–9. Google Scholar
54. J. Wan, D. Wang, S. C. H. Hoi, P. Wu, J. Zhu, Y. Zhang and J. Li, Deep learning for content-based image retrieval: A comprehensive study, in Proceedings of the 22nd ACM International Conference on Multimedia, 2014, pp. 157–166. Google Scholar
55. Y. Yan, S. Pouyanfar, H. Tian, S. Guan, H.-Y. Ha, S.-C. Chen, M.-L. Shyu and S. Hamid, Domain knowledge assisted data processing for Florida public hurricane loss model (invited paper), in 17th IEEE International Conference on Information Reuse and Integration, 2016, pp. 441–447. Google Scholar
56. Y. Yan, Q. Zhu, M.-L. Shyu and S.-C. Chen, A classifier ensemble framework for multimedia big data classification, in 17th IEEE International Conference on Information Reuse and Integration, 2016, pp. 615–622. Google Scholar
57. Y. Yang, Exploring hidden coherent feature groups and temporal semantics for multimedia big data analysis, PhD thesis, Florida International University (FL, USA, 2015). Google Scholar
58. Y. Yang and S.-C. Chen, Ensemble learning from imbalanced data set for video event detection, in IEEE International Conference on Information Reuse and Integration, 2015, pp. 82–89. Google Scholar
59. Q. Zhu, L. Lin, M.-L. Shyu and S.-C. Chen, Effective supervised discretization for classification based on correlation maximization, in IEEE International Conference on Information Reuse and Integration, 2011, pp. 390–395. Google Scholar