Framework for deep learning based model for human activity recognition (HAR) using adapted PSRA6 dataset
Rukhsarbano S. Sheikh, Sudhir Madhav Patil and Maneetkumar R. Dhanvijay
Abstract
Perimeter surveillance at critical infrastructure sites is the most crucial aspect for such site owners. The titleholders use enhanced technology to keep an eye on suspicious activities and ground level movements using artificial intelligence (AI)-based smart cameras on perimeter border. In recent years, the use of AI has increased in the surveillance system that is deployed at critical areas to obtain a live feed of the ground situation. This allows the detection of human intrusions and the classification of targets based on human activity recognition (HAR). HAR is an important task for timely prevention of any kind of attack or intrusion. Surveillance is the most common application of vision-based HAR research. In recent years, deep learning has led to many AI applications in surveillance. This paper reports a customised video dataset concerning to perimeter surveillance related activity for 6 human action classes (PSRA6) pertaining to suspicious human activity through HAR. Three simple and built-from-scratch deep learning based convolutional neural network (CNN) architectures: convolution and long short-term memory (CONVLSTM), long-term recurrent convolutional network (LRCN), and 2-layer CNN, are used for the intended HAR. Python interface for all the three architectures has been provided by using Keras library. Performances of these architectures are investigated in terms of accuracy, precision, recall and F1 score. This work presented an effective method for collecting and characterising the adapted PSRA6 dataset. Based on the performance comparison, the 2-layer CNN architecture outperforms all other architectures with an accuracy of 96.77%, loss of 0.21, weighted average precision of 97%, weighted average recall of 97%, and weighted average F1 score of 97%. Though the designed architectures are limited by computational power, the 2-layer CNN model performed the best.
Keyword
CNN, Deep learning, Keras, Human action recognition, PSRA6, Neural network.
Cite this article
Sheikh RS, Patil SM, Dhanvijay MR.Framework for deep learning based model for human activity recognition (HAR) using adapted PSRA6 dataset . International Journal of Advanced Technology and Engineering Exploration. 2023;10(98):37-66. DOI:10.19101/IJATEE.2021.876325
Refference
[1]Goyal A, Anandamurthy SB, Dash P, Acharya S, Bathla D, Hicks D, et al. Automatic border surveillance using machine learning in remote video surveillance systems. In emerging trends in electrical, communications, and information technologies 2020 (pp. 751-60). Springer, Singapore.
[2]Janiesch C, Zschech P, Heinrich K. Machine learning and deep learning. Electronic Markets. 2021; 31(3):685-95.
[3]Vrigkas M, Nikou C, Kakadiaris IA. A review of human activity recognition methods. Frontiers in Robotics and AI. 2015; 2:1-28.
[4]Jegham I, Khalifa AB, Alouani I, Mahjoub MA. Vision-based human action recognition: an overview and real world challenges. Forensic Science International: Digital Investigation. 2020; 32:1-14.
[5]Reddy KK, Shah M. Recognizing 50 human action categories of web videos. Machine Vision and Applications. 2013; 24(5):971-81.
[6]Soomro K, Zamir AR, Shah M. UCF101: a dataset of 101 human actions classes from videos in the wild. Center for Research in Computer Vision, University of Central Florida. 2012: 1-8.
[7]Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T. HMDB: a large video database for human motion recognition. In international conference on computer vision 2011 (pp. 2556-63). IEEE.
[8]Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In proceedings of the 17th international conference on pattern recognition 2004 (pp. 32-6). IEEE.
[9]https://academictorrents.com/details/184d11318372f70018cf9a72ef867e2fb9ce1d26. Accessed 12 March 2022.
[10]Li A, Thotakuri M, Ross DA, Carreira J, Vostrikov A, Zisserman A. The ava-kinetics localized human actions video dataset. Computing Research Repository. 2020; 5(7):1-8.
[11]Cheng M, Cai K, Li M. RWF-2000: an open large scale video database for violence detection. In 25th international conference on pattern recognition 2021 (pp. 4183-90). IEEE.
[12]Barekatain M, Martí M, Shih HF, Murray S, Nakayama K, Matsuo Y, et al. Okutama-action: an aerial view video dataset for concurrent human action detection. In proceedings of the conference on computer vision and pattern recognition workshops 2017 (pp. 28-35). IEEE.
[13]Singh S, Velastin SA, Ragheb H. Muhavi: a multicamera human action video dataset for the evaluation of action recognition methods. In international conference on advanced video and signal based surveillance 2010 (pp. 48-55). IEEE.
[14]Demir U, Rawat YS, Shah M. Tinyvirat: low-resolution video action recognition. In 25th international conference on pattern recognition 2021 (pp. 7387-94). IEEE.
[15]Ranganarayana K, Rao GV. Action recognition in low resolution videos using FO-SVM. Indian Journal of Computer Science and Engineering. 2021; 12(4):1149-62.
[16]Sargano AB, Wang X, Angelov P, Habib Z. Human action recognition using transfer learning with deep representations. In international joint conference on neural networks 2017 (pp. 463-9). IEEE.
[17]Ji S, Xu W, Yang M, Yu K. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2012; 35(1):221-31.
[18]Mutegeki R, Han DS. A CNN-LSTM approach to human activity recognition. In international conference on artificial intelligence in information and communication 2020 (pp. 362-6). IEEE.
[19]Geng C, Song J. Human action recognition based on convolutional neural networks with a convolutional auto-encoder. In 5th international conference on computer sciences and automation engineering 2016 (pp. 933-8). Atlantis Press.
[20]Aggarwal JK, Ryoo MS. Human activity analysis: a review. ACM Computing Surveys. 2011; 43(3):1-43.
[21]Mustafa T, Dhavale S, Kuber MM. Performance analysis of inception-v2 and yolov3-based human activity recognition in videos. SN Computer Science. 2020; 1(3):1-7.
[22]Zeng M, Nguyen LT, Yu B, Mengshoel OJ, Zhu J, Wu P, et al. Convolutional neural networks for human activity recognition using mobile sensors. In 6th international conference on mobile computing, applications and services 2014 (pp. 197-205). IEEE.
[23]Weinland D, Ronfard R, Boyer E. A survey of vision-based methods for action representation, segmentation and recognition. Computer Vision and Image Understanding. 2011; 115(2):224-41.
[24]Serrano I, Deniz O, Espinosa-aranda JL, Bueno G. Fight recognition in video using hough forests and 2D convolutional neural network. IEEE Transactions on Image Processing. 2018; 27(10):4787-97.
[25]Sharma R, Singh A. An integrated approach towards efficient image classification using deep CNN with transfer learning and PCA. Journal: Advances in Technology Innovation. 2022; 2022(2):105-17.
[26]Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Communications of the ACM. 2017; 60(6):84-90.
[27]Islam SS, Dey EK, Tawhid MN, Hossain BM. A CNN based approach for garments texture design classification. Advances in Technology Innovation. 2017; 2(4):119-25.
[28]Rajakumaran S, Dr JS. Improvement in tongue color image analysis for disease identification using deep learning based depth wise separable convolution model [J]. Indian Journal of Computer Science and Engineering. 2021; 12(1):21-34.
[29]https://machinelearningmastery.com/cnn-models-for-human-activity-recognition-time-series-classification/. Accessed 12 March 2022.
[30]Liu J, Luo J, Shah M. Recognizing realistic actions from videos “in the wild”. In IEEE conference on computer vision and pattern recognition 2009 (pp. 1996-2003). IEEE.
[31]Ashhar SM, Mokri SS, Abd RAA, Huddin AB, Zulkarnain N, Azmi NA, et al. Comparison of deep learning convolutional neural network (CNN) architectures for CT lung cancer classification. International Journal of Advanced Technology and Engineering Exploration. 2021; 8(74):126-34.
[32]Ankalaki S, Thippeswamy MN. A customized 1D-CNN approach for sensor-based human activity recognition. International Journal of Advanced Technology and Engineering Exploration. 2022; 9(87):216-31.
[33]Qin Y, Mo L, Ye J, Du Z. Multi-channel features fitted 3D CNNs and LSTMs for human activity recognition. In 10th international conference on sensing technology 2016 (pp. 1-5). IEEE.
[34]Suresha M, Kuppa S, Raghukumar DS. A study on deep learning spatiotemporal models and feature extraction techniques for video understanding. International Journal of Multimedia Information Retrieval. 2020; 9(2):81-101.
[35]Uddin MZ, Khaksar W, Torresen J. Human activity recognition using robust spatiotemporal features and convolutional neural network. In international conference on multisensor fusion and integration for intelligent systems 2017 (pp. 144-9). IEEE.
[36]Beddiar DR, Nini B, Sabokrou M, Hadid A. Vision-based human activity recognition: a survey. Multimedia Tools and Applications. 2020; 79(41):30509-55.
[37]Arunnehru J, Chamundeeswari G, Bharathi SP. Human action recognition using 3D convolutional neural networks with 3D motion cuboids in surveillance videos. Procedia Computer Science. 2018; 133:471-7.
[38]Chen H, Mahfuz S, Zulkernine F. Smart phone based human activity recognition. In international conference on bioinformatics and biomedicine 2019 (pp. 2525-32). IEEE.
[39]Bilal M, Maqsood M, Mehmood I, Javaid M, Rho S. An activity recognition framework for overlapping activities using transfer learning. In international conference on computational science and computational intelligence 2020 (pp. 701-5). IEEE.
[40]Sun Z, Ke Q, Rahmani H, Bennamoun M, Wang G, Liu J. Human action recognition from various data modalities: a review. IEEE transactions on pattern analysis and machine intelligence. 2022: 1-20.
[41]Arif S, Wang J, Ul HT, Fei Z. 3D-CNN-based fused feature maps with LSTM applied to action recognition. Future Internet. 2019; 11(2):1-17.