The problem of human activity recognition (HAR) has been increasingly attracting the efforts of the research community, having several applications. It consists of recognizing human motion and/or behavior within a given image or a video sequence, using as input raw sensor measurements. In this paper, a multimodal approach addressing the task of video-based HAR is proposed. It is based on 3D visual data that are collected using an RGB + depth camera, resulting to both raw video and 3D skeletal sequences. These data are transformed into six different 2D image representations; four of them are in the spectral domain, another is a pseudo-colored image. The aforementioned representations are based on skeletal data. The last representation is a “dynamic” image which is actually an artificially created image that summarizes RGB data of the whole video sequence, in a visually comprehensible way. In order to classify a given activity video, first, all the aforementioned 2D images are extracted and then six trained convolutional neural networks are used so as to extract visual features. The latter are fused so as to form a single feature vector and are fed into a support vector machine for classification into human activities. For evaluation purposes, a challenging motion activity recognition dataset is used, while single-view, cross-view and cross-subject experiments are performed. Moreover, the proposed approach is compared to three other state-of-the-art methods, demonstrating superior performance in most experiments.

Keywords:

References

1. M. Abadi et al., TensorFlow: A system for large-scale machine learning], in 12th USENIX Symp. Operating Systems Design and Implementation (OSDI 16) (Savannah, GA, USA/ The USENIX Association, 2016), pp. 265–283. Google Scholar
2. K. M. Alam, N. Siddique and H. Adeli , A dynamic ensemble learning algorithm for neural networks, Neural. Comput. Appl. 32(12) (2020) 8675–8690. Crossref, Web of Science, Google Scholar
3. D. Avola, M. Cascio, L. Cinque, G. L. Foresti, C. Massaroni and E. Rodolá , 2D skeleton-based action recognition via two-branch stacked LSTM-RNNs, IEEE Trans. Multimedia 22(10) (2020) 2481–2496. Crossref, Web of Science, Google Scholar
4. D. Avola, M. Cascio, L. Cinque, A. Fagioli and G. L. Foresti , Human silhouette and skeleton video synthesis through Wi-Fi signals, Int. J. Neural Syst. 32(5) (2022) 2250015. Link, Web of Science, Google Scholar
5. H. Bilen, B. Fernando, E. Gavves, A. Vedaldi and S. Gould , Dynamic image networks for action recognition, in IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Las Vegas, NY, USA, 2016), pp. 3034–3042. Crossref, Google Scholar
6. C. Caetano, J. Sena, F. Brémond, J. A. Dos Santos and W. R. Schwartz , Skelemotion: A new representation of skeleton joint sequences based on motion information for 3D action recognition, in IEEE Int. Conf. Advanced Video and Signal Based Surveillance (AVSS) (IEEE, Taipei, Taiwan, 2019), pp. 1–8. Crossref, Google Scholar
7. Z. Cao, T. Simon, S. E. Wei and Y. Sheikh , Realtime multi-person 2D pose estimation using part affinity fields, in IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Honolulu, HI, USA, 2017), pp. 7291–7299. Crossref, Google Scholar
8. A. Chaaraoui, J. Padilla-Lopez and F. Flórez-Revuelta , Fusion of skeletal and silhouette-based features for human action recognition with RGB-D devices, in IEEE Int. Conf. Computer Vision Workshops (IEEE, Sydney, Australia, 2013), pp. 91–97. Crossref, Google Scholar
9. Y. Chen, C. Shen, X. S. Wei, L. Liu and J. Yang , Adversarial PoseNet: A structure-aware convolutional network for human pose estimation, in IEEE Int. Conf. Computer Vision, (IEEE, Venice, Italy, 2017), pp. 1212–1221. Crossref, Google Scholar
10. Y. Chen, L. Wang, C. Li, Y. Hou and W. Li , ConvNets-based action recognition from skeleton motion maps, Multimedia Tools Appl. 79(3) (2020) 1707–1725. Crossref, Web of Science, Google Scholar
11. F. Chollet et al., Keras (2015), Available at: https://github.com/fchollet/keras. Google Scholar
12. C. Debes, A. Merentitis, S. Sukhanov, M. Niessen, N. Frangiadakis and A. Bauer , Monitoring activities of daily living in smart homes: Understanding human behavior, IEEE Signal Process. Mag. 33(2) (2016) 81–94. Crossref, Web of Science, Google Scholar
13. J. Deng, W. Dong, R. Socher, L. J. Li, K. Li and L. Fei-Fei , ImageNet: A large-scale hierarchical image database, in Proc. IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Miami, FL, USA, 2009), pp. 248–255. Crossref, Google Scholar
14. H. Duan, Y. Zhao, K. Chen, D. Lin and B. Dai , Revisiting skeleton-based action recognition, in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, New Orleans Louisiana, USA, 2022), pp. 2969–2978. Crossref, Google Scholar
15. M. Ehatisham-Ul-Haq, A. Javed, M. A. Azam, H. M. Malik, A. Irtaza, I. H. Lee and M. T. Mahmood , Robust human activity recognition using multimodal feature-level fusion, IEEE Access 7 (2019) 60736–60751. Crossref, Web of Science, Google Scholar
16. B. Fernando, E. Gavves, J. M. Oramas, A. Ghodrati and T. Tuytelaars , Modeling video evolution for action recognition, in IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Boston, MA, USA, 2015), pp. 5378–5387. Crossref, Google Scholar
17. X. Gao, W. Hu, J. Tang, J. Liu and Z. Guo , Optimized skeleton-based action recognition via sparsified graph regression, in 27th ACM Int. Conf. Multimedia (ACM, Nice, France, 2019), pp. 601–610. Crossref, Google Scholar
18. I. Giannakos, E. Mathe, E. Spyrou and P. Mylonas , A study on the effect of occlusion in human activity recognition, in 14th Pervasive Technologies Related to Assistive Environments Conf. (ACM, Corfu, Greece, 2021), pp. 473–482. Crossref, Google Scholar
19. R. C. Gonzalez and R. E. Woods , Digital Image Processing (Pearson Education, 2018). Google Scholar
20. A. Graves A. R. Mohamed and G. Hinton , Speech recognition with deep recurrent neural networks, in IEEE Int. Conf. Acoustics, Speech and Signal Processing (IEEE, Vancouver, Canada, 2013), pp. 6645–6649. Crossref, Google Scholar
21. G. Guo and A. Lai , A survey on still image based human action recognition, Pattern Recognit. 47(10) (2014) 3343–3361. Crossref, Web of Science, Google Scholar
22. T. Guo, H. Liu, Z. Chen, M. Liu, T. Wang and R. Ding , Contrastive learning from extremely augmented skeleton sequences for self-supervised action recognition, AAAI Conf. Artif. Intell. 36(1) (2022) 762–770. Crossref, Google Scholar
23. M. A. Haque et al., Deep multimodal pain recognition: A database and comparison of spatio-temporal visual modalities, in IEEE Int’l Conf. Automatic Face & Gesture Recognition (IEEE, Xi’an, China, 2018), pp. 250–257. Crossref, Google Scholar
24. N. Hatami, Y. Gavet and J. Debayle , Classification of time-series images using deep convolutional neural networks, in Int. Conf. Machine Vision (ICMV), Vol. 10696 (SPIE, Munich, German, 2018), pp. 242–249. Google Scholar
25. V. Hernandez, T. Suzuki and G. Venture , Convolutional and recurrent neural network for human activity recognition: Application on American sign language, PLoS ONE 15 (2020) e0228869. Crossref, Medline, Web of Science, Google Scholar
26. Y. Hou, Z. Li, P. Wang and W. Li , Skeleton optical spectra-based action recognition using convolutional neural networks, IEEE Trans. CSVT 28(3) (2016) 807–811. Google Scholar
27. J. F. Hu, W. S. Zheng, J. Lai and J. Zhang , Jointly learning heterogeneous features for RGB-D activity recognition, in IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Boston, MA, USA, 2015), pp. 5344–5352. Crossref, Google Scholar
28. T. Huynh-The, C. H. Hua, T. T. Ngo, T. T. and D. S. Kim , Image representation of pose-transition feature for 3D skeleton-based action recognition, Inf. Sci. 513 (2020) 112–126. Crossref, Web of Science, Google Scholar
29. J. Imran and B. Raman , Evaluating fusion of RGB-D and inertial sensors for multimodal human action recognition, J. Ambient Intell. Humaniz. Comput. 11(1) (2020) 189–208. Crossref, Web of Science, Google Scholar
30. A. K. Jain , Fundamentals of Digital Image Processing (Prentice-Hall, 1989). Google Scholar
31. W. Jiang and Z. Yin , Human activity recognition using wearable sensors by deep convolutional neural networks, in 23rd ACM Int. Conf. Multimedia (ACM, Brisbane, Australia, 2015), 1307–1310. Crossref, Google Scholar
32. G. Kalouris, E. I. Zacharaki and V. Megalooikonomou , Improving CNN-based activity recognition by data augmentation and transfer learning, in Int. Conf. Industrial Informatics (INDIN) (IEEE, Helsinki-Espoo, Finland, 2019), pp. 1387–1394. Crossref, Google Scholar
33. J. Y. Kao, A. Ortega, D. Tian, H. Mansour and A. Vetro , Graph based skeleton modeling for human activity analysis, in IEEE Int. Conf. Image Processing (ICIP) (IEEE, Taipei, Taiwan, 2019), pp. 2025–2029. Crossref, Google Scholar
34. Q. Ke, S. An, M. Bennamoun, F. Sohel and F. Boussaid , SkeletonNet: Mining deep part features for 3-D action recognition, IEEE Signal Process. Lett. 24(6) (2017) 731–735. Crossref, Web of Science, Google Scholar
35. Q. Ke, M. Bennamoun, H. Rahmani, S. An, F. Sohel and F. Boussaid , Learning latent global network for skeleton-based action prediction, IEEE Trans. Image Process. 29 (2019) 959–970. Crossref, Web of Science, Google Scholar
36. A. Keogh, J. F. Dorn, L. Walsh, F. Calvo and B. Caulfield , Comparing the usability and acceptability of wearable sensors among older irish adults in a real-world context: Observational study, JMIR mHealth uHealth 8(4) (2020) e15704. Crossref, Medline, Web of Science, Google Scholar
37. P. Koniusz, L. Wang and A. Cherian , Tensor representations for action recognition, IEEE Trans. Pattern Anal. Mach. Intell. 44(2) (2021) 648–665. Crossref, Web of Science, Google Scholar
38. I. A. Kostis, E. Mathe, E. Spyrou and P. Mylonas , Human activity recognition under partial occlusion, in Int. Conf. Engineering Applications of Neural Networks (Springer, Creta, Greece, 2022), pp. 297–309. Crossref, Google Scholar
39. D. Koutrintzes, E. Mathe and E. Spyrou , Boosting the performance of deep approaches through fusion with handcrafted features, in Int. Conf. Pattern Recognition Applications and Methods — ICPRAM (INSTICC, 2022), pp. 370–377. Crossref, Google Scholar
40. H. Kuehne, H. Jhuang, E. Garrote, T. Poggio and T. Serre , HMDB: A large video database for human motion recognition, in Int. Conf. Computer Vision (IEEE, Barcelona, Spain, 2011), pp. 2556–2563. Crossref, Google Scholar
41. T. Kwon, B. Tekin, S. Tang and M. Pollefeys , Context-aware sequence alignment using 4D skeletal augmentation, in IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, New Orleans Louisiana, USA, 2022), pp. 8172–8182. Crossref, Google Scholar
42. I. Laptev, M. Marszalek, C. Schmid and B. Rozenfeld , Learning realistic human actions from movies, in IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Anchorage, Alaska, USA, 2008), pp. 1–8. Crossref, Google Scholar
43. Y. LeCun, L. Bottou, Y. Bengio and P. Haffner , Gradient-based learning applied to document recognition, Proc. IEEE 86(11) (1998) 2278–2324. Crossref, Google Scholar
44. C. Li, Q. Zhong, D. Xie and S. Pu , Skeleton-based action recognition with convolutional neural networks, in IEEE Int. Conf. Multimedia & Expo Workshops (ICMEW) (IEEE, Hong Kong, 2017), pp. 597–600. Google Scholar
45. C. Li, Y. Hou, P. Wang, P. and W. Li , Joint distance maps based action recognition with convolutional neural networks, IEEE Signal Process. Lett. 24(5) (2017) 624–628. Crossref, Web of Science, Google Scholar
46. C. Li, P. Wang, S. Wang, Y. Hou and W. Li , Skeleton-based action recognition using LSTM and CNN, in IEEE Int. Conf. Multimedia & Expo Workshops (IEEE, Hong Kong, 2017), 585–590. Google Scholar
47. X. Li, Y. Zhang, J. Zhang, S. Chen, I. Marsic, R. A. Farneth and R. S. Burd, Concurrent activity recognition with multimodal CNN-LSTM structure (2017), arXiv:1702.01638. Google Scholar
48. C. Li, Q. Zhong, D. Xie and S. Pu, Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation (2018), arXiv:1804.06055. Google Scholar
49. T. Li, L. Fan, M. Zhao, Y. Liu and D. Katabi , Making the invisible visible: Action recognition through walls and occlusions, in Proc. IEEE/CVF Int. Conf. Computer Vision (IEEE, Seoul, Korea, 2019), pp. 872–881. Crossref, Google Scholar
50. Y. Liang, F. He and X. Zeng , 3D mesh simplification with feature preservation based on whale optimization algorithm and differential evolution, Integr. Comput.-Aided Eng. 27(4) (2020) 417–435. Crossref, Web of Science, Google Scholar
51. Y. Liang, F. He, X. Zeng and J. Luo , An improved loop subdivision to coordinate the smoothness and the number of faces via multi-objective optimization, Integr. Comput.-Aided Eng. 29 (2022) 23–41. Crossref, Web of Science, Google Scholar
52. C. Liu, Y. Hu, Y. Li, S. Song and J. Liu. PKU-MMD: A large scale benchmark for continuous multi-modal human action understanding (2017), arXiv:1703.07475. Google Scholar
53. C. L. Liu, W. H. Hsaio and Y. C. Tu , Time series classification with multivariate convolutional neural network, IEEE Trans. Ind. Electron. 66(6) (2018) 4788–4797. Crossref, Web of Science, Google Scholar
54. J. Liu, N. Akhtar and A. Mian , Viewpoint invariant RGB-D human action recognition, in Int’l Conf. Digital Image Computing: Techniques and Applications (IEEE, Sydney, Australia, 2017), pp. 1–8. Crossref, Google Scholar
55. M. Liu, H. Liu and C. Chen , Enhanced skeleton visualization for view invariant human action recognition, Pattern Recognit. 68 (2017), pp. 346–362. Crossref, Web of Science, Google Scholar
56. J. Liu, A. Shahroudy, M. Perez, G. Wang, L. Y. Duan and A. C. Kot , NTU RGB+D 120: A large-scale benchmark for 3d human activity understanding, IEEE Trans. Pattern Anal. Mach. Intell. 42(10) (2019) 2684–2701. Crossref, Medline, Web of Science, Google Scholar
57. J. Liu, N. Akhtar and A. Mian , Skepxels: Spatio-temporal image representation of human skeleton joints for action recognition, CVPR Workshops (IEEE, Long Beach California, USA, 2019), pp. 10–19. Google Scholar
58. S. Majumder, T. Mondal and M. J. Deen , Wearable sensors for remote health monitoring, Sensors 17(1) (2017) 130. Crossref, Web of Science, Google Scholar
59. N. Marwan , A historical review of recurrence plots, Eur. Phys. J. Spec. Top. 164(1) (2008) 3–12. Crossref, Web of Science, Google Scholar
60. G. Paoletti, J. Cavazza, C. Beyan and A. Del Bue , Subspace clustering for action recognition with covariance representations and temporal pruning, in Int. Conf. Pattern Recognition (ICPR) (Springer, 2021), pp. 6035–6042. Crossref, Google Scholar
61. A. Papadakis, E. Mathe, I. Vernikos, A. Maniatis, E. Spyrou and P. Mylonas , Recognizing human actions using 3D skeletal information and CNNs, in Engineering Applications of Neural Networks. EANN 2019, Communications in Computer and Information Science, Vol. 1000 (Springer, Cham, 2019), pp. 511–521. Crossref, Google Scholar
62. A. Papadakis, E. Mathe, E. Spyrou and P. Mylonas , A geometric approach for cross-view human action recognition using deep learning, in IEEE Int. Symp. Image and Signal Processing and Analysis (ISPA) (IEEE, Dubrovnik, Croatia, 2019), pp. 258–263. Crossref, Google Scholar
63. A. Papadakis, I. Vernikos, E. Mathe and E. Spyrou , Skeleton geometric transformation for human action recognition using convolutional neural networks, in ACM Int. Conf. Pervasive Technologies Related to Assistive Environments (ACM, 2020), pp. 1–2. Crossref, Google Scholar
64. D. R. Pereira, M. A. Piteri, A. N. Souza, J. P. Papa and H. Adeli , FEMa: A finite element machine for fast learning, Neural. Comput. Appl. 32(10) (2020) 6393–6404. Crossref, Web of Science, Google Scholar
65. H. H. Pham, H. Salmane, L. Khoudour, A. Crouzil, P. Zegers and S. Velastin , Spatio–temporal image representation of 3D skeletal movements for view-invariant action recognition with deep convolutional neural networks, Sensors 19(8) (2019) 1932. Crossref, Web of Science, Google Scholar
66. M. H. Rafiei and H. Adeli , A new neural dynamic classification algorithm, IEEE Trans. Neural Netw. Learn. Syst. 28(12) (2017) 3074–3083. Crossref, Medline, Web of Science, Google Scholar
67. S. Ranasinghe, F. Al Machot and H. C. Mayr , A review on applications of activity recognition systems with regard to performance and evaluation, Int. J. Distrib. Sens. Netw. 12(8) (2016) 1550147716665520. Crossref, Web of Science, Google Scholar
68. C. Schuldt, I. Laptev and B. Caputo , Recognizing human actions: A local SVM approach, in Int. Conf. Pattern Recognition (ICPR) (IEEE, Cambridge, UK 2004), pp. 32–36. Crossref, Google Scholar
69. V. Silva, F. Soares, C. P. Leao, J. S. Esteves and G. Vercelli , Skeleton driven action recognition using an image-based spatial-temporal representation and convolution neural network, Sensors 21(13) (2021) 4342. Crossref, Web of Science, Google Scholar
70. K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition (2014), arXiv:1409.1556. Google Scholar
71. K. Simonyan and A. Zisserman , Two-stream convolutional networks for action recognition in videos, in NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems, Vol. 1 (2014) 568–576. Google Scholar
72. A. J. Smola and B. Schölkopf , A tutorial on support vector regression, Stat. Comput. 14 (2004) 199–222. Crossref, Web of Science, Google Scholar
73. K. Soomro, A. R. Zamir and M. Shah, UCF101: A dataset of 101 human actions classes from videos in the wild (2012), arXiv:1212.0402. Google Scholar
74. E. Spyrou, E. Mathe, G. Pikramenos, K. Kechagias and P. Mylonas , Data augmentation vs. domain adaptation — A case study in human activity recognition, Technologies 8(4) (2020) 55. Crossref, Web of Science, Google Scholar
75. N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever and R. Salakhutdinov , Dropout: A simple way to prevent neural networks from overfitting, J. Mach. Learn. Res. 15(1) (2014) 1929–1958. Google Scholar
76. O. Steven Eyobu and D. S. Han , Feature representation and data augmentation for human activity classification based on wearable IMU sensor data using a deep LSTM neural network, Sensors 18 (2018) 2892. Crossref, Web of Science, Google Scholar
77. S. Stylianou-Nikolaidou, I. Vernikos, E. Mathe, E. Spyrou and P. Mylonas , A novel CNN-LSTM hybrid architecture for the recognition of human activities, in Int. Conf. Engineering Applications of Neural Networks (Springer, 2021), pp. 121–132. Crossref, Google Scholar
78. L. Sun, K. Jia, K. Chen, D. Y. Yeung, B. E. Shi and S. Savarese , Lattice long short-term memory for human action recognition, in Int. Conf. Computer Vision (IEEE, Venice, Italy, 2017), pp. 2147–2156. Crossref, Google Scholar
79. J. Sun, B. Zhou, M. J. Black and A. Chandrasekaran, LocATe: End-to-end localization of actions in 3D with transformers (2022), arXiv:2203.10719. Google Scholar
80. I. Sutskever, O. Vinyals and Q. V. Le , Sequence to sequence learning with neural networks, in NIPS’14: Proceedings of the 27th International Conference on Neural Information Processing Systems Vol. 2 (2014) 3104–3112. Google Scholar
81. Y. Tang, Y. Tian, J. Lu, P. Li and J. Zhou , Deep progressive reinforcement learning for skeleton-based action recognition, in IEEE Conf. Computer Vision and Pattern Recognition (IEEE, Salt Lake City, Utah, USA, 2018), pp. 5323–5332. Crossref, Google Scholar
82. N. Tasnim, M. K. Islam and J. H. Baek , Deep learning based human activity recognition using spatio-temporal image formation of skeleton joints, Appl. Sci. 11(6) (2021) 2675. Crossref, Google Scholar
83. T. Theoharis, G. Papaioannou, N. Platis and N. M. Patrikalakis , Graphics and Visualization: Principles & Algorithms (CRC Press, 2008). Crossref, Google Scholar
84. P. Verma, A. Sah and R. Srivastava , Deep learning-based multi-modal approach using RGB and skeleton sequences for human activity recognition, Multimedia Syst. 26(6) (2020) 671–685. Crossref, Web of Science, Google Scholar
85. I. Vernikos, E. Mathe, A. Papadakis, E. Spyrou and P. Mylonas , An image representation of skeletal data for action recognition using convolutional neural networks, in ACM Int. Conf. PErvasive Technologies Related to Assistive Environments (ACM, Rhodes, Greece, 2019), pp. 325–326. Crossref, Google Scholar
86. I. Vernikos, E. Mathe, E. Spyrou, A. Mitsou, T. Giannakopoulos and P. Mylonas , Fusing handcrafted and contextual features for human activity recognition, in 2019 14th International Workshop on Semantic and Social Media Adaptation and Personalization (SMAP) (2019), pp. 1–6. https://doi.org/10.1109/SMAP.2019.8864848. Crossref, Google Scholar
87. M. Vrigkas, C. Nikou, C. and I. A. Kakadiaris , A review of human activity recognition methods, Front. Robot. AI 2(28) (2015), https://doi.org/10.3389/frobt.2015.00028. Google Scholar
88. P. Wang, Z. Li, Y. Hou and W. Li , Action recognition based on joint trajectory maps using convolutional neural networks, in ACM Int. Conf. Multimedia (ACM, Amsterdam, The Netherlands. 2016), pp. 102–106. Crossref, Google Scholar
89. P. Wang, W. Li, P. Ogunbona, J. Wan and S. Escalera , RGB-D-based human motion recognition with deep learning: A survey, Comput. Vis. Image Underst. 171 (2018) 118–139. Crossref, Web of Science, Google Scholar
90. L. Xia, C. C. Chen and J. K. Aggarwal , View invariant human action recognition using histograms of 3D joints, in 2012 IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops (IEEE, Providence, Rhode Island, USA, 2012), pp. 20–27. Crossref, Google Scholar
91. Z. Yang, Y. Li, J. Yang and J. Luo , Action recognition with spatio–temporal visual attention on skeleton image sequences, IEEE Trans. CSVT 29(8) (2018) 2405–2415. Google Scholar
92. P. Zhang, C. Lan, J. Xing, W. Zeng, J. Xue and N. Zheng , View adaptive recurrent neural networks for high performance human action recognition from skeleton data, in IEEE Int. Conf. Computer Vision (IEEE, Venice, Italy, 2017), pp. 2117–2126. Crossref, Google Scholar
93. P. Zhang, J. Xue, C. Lan, W. Zeng, Z. Gao and N. Zheng , EleAtt-RNN: Adding attentiveness to neurons in recurrent neural networks, IEEE Trans. Image Process. 29 (2019) 1061–1073. Crossref, Web of Science, Google Scholar
94. P. Zhang, C. Lan, W. Zeng, J. Xing, J. Xue and N. Zheng , Semantics-guided neural networks for efficient skeleton-based human action recognition, in IEEE/CVF Conf. Computer Vision and Pattern Recognition (IEEE, Seattle, WA, USA, 2020), pp. 1112–1121. Crossref, Google Scholar
95. S. Zhang, Y. Yang, J. Xiao, X. Liu, Y. Yang, D. Xie and Y. Zhuang , Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks, IEEE Trans. Multimedia 2(9) (2018) 2330–2343. Crossref, Google Scholar
96. G. Zhu, L. Zhang, P. Shen and J. Song , Multimodal gesture recognition using 3-D convolution and convolutional LSTM, IEEE Access 5 (2017) 4517–4524. Crossref, Web of Science, Google Scholar