Open Access

Violence Detection by Pretrained Modules with Different Deep Learning Approaches

Shakil Ahmed Sumon

Department of Electrical and Computer Engineering, North South University, Dhaka 1229, Bangladesh

E-mail Address: shakil.sumon@northsouth.edu

Search for more papers by this author

Raihan Goni

Department of Electrical and Computer Engineering, North South University, Dhaka 1229, Bangladesh

E-mail Address: raihan.goni@northsouth.edu

Search for more papers by this author

Niyaz Bin Hashem

Department of Electrical and Computer Engineering, North South University, Dhaka 1229, Bangladesh

E-mail Address: niyaz.hashem@northsouth.edu

Search for more papers by this author

Tanzil Shahria

Department of Electrical and Computer Engineering, North South University, Dhaka 1229, Bangladesh

E-mail Address: tanzil.shahria@northsouth.edu

Search for more papers by this author

, and

Rashedur M. Rahman

Department of Electrical and Computer Engineering, North South University, Dhaka 1229, Bangladesh

E-mail Address: rashedur.rahman@northsouth.edu

Corresponding author.

Search for more papers by this author

https://doi.org/10.1142/S2196888820500013Cited by:63 (Source: Crossref)

Abstract

In this paper, we have explored different strategies to find out the saliency of the features from different pretrained models in detecting violence in videos. A dataset has been created which consists of violent and non-violent videos of different settings. Three ImageNet models; VGG16, VGG19, ResNet50 are being used to extract features from the frames of the videos. In one of the experiments, the extracted features have been feed into a fully connected network which detects violence in frame level. Moreover, in another experiment, we have fed the extracted features of 30 frames to a long short-term memory (LSTM) network at a time. Furthermore, we have applied attention to the features extracted from the frames through spatial transformer network which also enables transformations like rotation, translation and scale. Along with these models, we have designed a custom convolutional neural network (CNN) as a feature extractor and a pretrained model which is initially trained on a movie violence dataset. In the end, the features extracted from the ResNet50 pretrained model proved to be more salient towards detecting violence. These ResNet50 features, in combination with LSTM provide an accuracy of 97.06% which is better than the other models we have experimented with.

Keywords:

References

1. S. A. Sumon, M. T. Shahria, M. R. Goni, N. Hasan, A. M. Almarufuzzaman and R. M. Rahman, Violent crowd flow detection using deep learning, Asian Conf. on Intelligent Information and Database Systems (Springer, Cham, 2019), pp. 613–625. Crossref, Google Scholar
2. C.-H. Demarty, C. Penet, M. Soleymani and G. Gravier, VSD, a public dataset for the detection of violent scenes in movies: Design, annotation, analysis and evaluation, Multimedia Tools and Applications, Vol. 74, No. 17 (Springer Verlag, 2014), pp. 7379–7404. Google Scholar
3. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez and I. Polosukhin, Attention is all you need, in Advances in Neural Information Processing Systems (Curran Associates, Inc., 2017), pp. 5998–6008. Google Scholar
4. M. Jaderberg, K. Simonyan and A. Zisserman, Spatial transformer networks, in Advances in Neural Information Processing Systems (Curran Associates, Inc., 2015), pp. 2017–2025. Google Scholar
5. C. Ding, S. Fan, M. Zhu, W. Feng and B. Jia, Violence detection in video by using 3D convolutional neural networks, International Symp. Visual Computing (Springer, Cham, 2014), pp. 551–558. Crossref, Google Scholar
6. E. Bermejo Nievas, O. Deniz Suarez, G. Bueno García and R. Sukthankar, Violence detection in video using computer vision techniques, in CAIP 2011, eds., P. Real, D. Diaz-Pernil, H. Molina-Abril, A. Berciano and W. Kropatsch, LNCS, Vol. 6855 (Springer, Heidelberg, 2011), pp. 332–339. Google Scholar
7. I. Dai, J. Tu, Z. Shi, Y. G. Jiang and X. Xue, Violent scenes detection using motion features and part-level attributes, in MediaEval Workshop (Barcelona, Catalunya, Spain, 18–19 October, 2013). Google Scholar
8. S. Mohammadi, H. Kiani, A. Perina and V. Murino, Violence detection in crowded scenes using substantial derivative, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (IEEE, 2015), pp. 1–6. Crossref, Google Scholar
9. T. Zhang, W. Jia, B. Yang, J. Yang, X. He and Z. Zheng, Mowld: A robust motion image descriptor for violence detection, Multim. Tools Appl. 76(1) (2017) 1419–1438. Crossref, Google Scholar
10. Z. Dong, J. Qin and Y. Wang, Multi-stream deep networks for person to person violence detection in videos, Chinese Conf. Pattern Recognition (Springer, Singapore, 2016), pp. 517–531. Crossref, Google Scholar
11. T. Hassner, Y. Itcher and O. Kliper-Gross, Violent flows: Real-time detection of violent crowd behavior, Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conf. (IEEE, 2012), pp. 1–6. Crossref, Google Scholar
12. F. De Souza and H. Pedrini, Detection of violent events in video sequences based on census transform histogram, Graphics, Patterns and Images (SIBGRAPI), 2017 30th SIBGRAPI Conf. (IEEE, 2017), pp. 323–329. Crossref, Google Scholar
13. S. Mohammadi, H. Kiani, A. Perina and V. Murino, Violence detection in crowded scenes using substantial derivative, 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (IEEE, 2015), pp. 1–6. Crossref, Google Scholar
14. Y. Lyu and Y. Yang, Violence detection algorithm based on local spatiotemporal features and optical flow, 2015 Int. Conf. Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration (ICIICII) (IEEE, 2015), pp. 307–311. Crossref, Google Scholar
15. Y. Xu and J. Wen, Detecting robbery and violent scenarios, Robot, Vision and Signal Processing (RVSP), 2013 Second Int. Conf. (IEEE, 2013), pp. 25–30. Crossref, Google Scholar
16. P. Zhou, Q. Ding, H. Luo and X. Hou, Violent interaction detection in video based on deep learning, Journal of Physics: Conference Series, Vol. 844, No. 1 (IOP Publishing, 2017), p. 012044. Google Scholar
17. E. B. Nievas, O. D. Suarez, G. B. García and R. Sukthankar, Violence detection in video using computer vision techniques, International Conference on Computer Analysis of Images and Patterns (Springer, Berlin, Heidelberg, 2011), pp. 332–339. Crossref, Google Scholar
18. T. Zhang, Z. Yang, W. Jia, B. Yang, J. Yang and X. He, A new method for violence detection in surveillance scenes, Multim. Tools Appl. 75(12) (2016) 7327–7349. Crossref, Google Scholar
19. Y. Gao, H. Liu, X. Sun, C. Wang and Y. Liu, Violence detection using oriented violent flows, Image Vis. Comput. 48 (2016) 37–41. Crossref, Google Scholar
20. N. Srivastava, E. Mansimov and R. Salakhudinov, Unsupervised learning of video representations using LSTMS, Int. Conf. Machine Learning (2015), pp. 843–852. Google Scholar
21. Q. Dai, R. W. Zhao, Z. Wu, X. Wang, Z. Gu, W. Wu and Y. G. Jiang, Fudan-Huawei at MediaEval 2015: Detecting violent scenes and affective impact in movies with deep learning, MediaEval 2015 Workshop (Wurzen, Germany, September 14–15, 2015). Google Scholar
22. M. M. Baccouche, Action classification in soccer videos with long short-term memory recurrent neural networks, Int. Conf. on Artificial Neural Networks (Springer, Berlin, Heidelberg, 2010), pp. 154–159. Crossref, Google Scholar
23. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556 (2014). Google Scholar
24. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016), pp. 770–778. Crossref, Google Scholar
25. M. T. Luong, H. Pham and C. D. Manning, Effective approaches to attention-based neural machine translation, arXiv preprint arXiv:1508.04025 (2015). Google Scholar
26. K. He, X. Zhang, S. Ren and J. Sun, Deep residual learning for image recognition, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (2016), pp. 770–778. Crossref, Google Scholar
27. G. Cybenko, Approximation by superpositions of a sigmoidal function, Math. Control Signals Syst. 2(4) (1989) 303–314. Crossref, Google Scholar

Vol. 07, No. 01

Metrics

Downloaded 4,496 times

History

Received 23 May 2019

Accepted 13 September 2019

Published: 25 October 2019

Information

This is an Open Access article published by World Scientific Publishing Company. It is distributed under the terms of the Creative Commons Attribution 4.0 (CC BY) License which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

Keywords

PDF download

System Upgrade on Tue, May 28th, 2024 at 2am (EDT)

Violence Detection by Pretrained Modules with Different Deep Learning Approaches

Abstract

Recommended