SADY: Student Activity Detection Using YOLO-based Deep Learning Approach

Anagha Deshpande, Krishna Warhade


Automating human activity recognition is one of computer vision's most appealing and pragmatic research areas. In this article, we have addressed the problem of video-based student activity detection. The student’s activity detection using YOLO (SADY) aims to recognize the normal and abnormal student activities to ensure immediate intervention in case of any risk or necessity. We created our classroom data set of around 220 recordings depicting seven student classroom activities. The YOLOv4 Tiny model was retrained using 5000 labeled keyframes extracted from the train videos. The model was then tested for single or multiple activity detections. We presented the evaluated results for various values of hyperparameters like confidence threshold and Intersection Over Union (IoU) thresholds for the proposed model. The model assigns a unique confidence score and action label to each frame for the test videos by positioning recurrent activity labels. The proposed approach achieved a mean average precision (mAP) of 95% and a frame per second rate (FPS) of 45 for the student activity Class Room (CR) dataset and mAP of 95.18 % for the LIRIS dataset. The experimental findings using the Class Room recorded and LIRIS publicly accessible dataset show that our proposed approach outperforms existing approaches regarding recognition accuracy and speed. The comparable results obtained in this research work imply that the proposed framework could effectively monitor student’s activities in schools, colleges, and universities.


Human activity recognition; convolution neural network; class Room dataset; Yolov4Tiny; multi-action detection

Full Text:



F. Hidayat, F. Hamami, I. A. Dahlan, S. H. Supangkat, A. Fadillah, and A. Hidayatuloh, "Real Time Video Analytics Based on Deep Learning and Big Data for Smart Station", Journal of Physics: Conference Series, vol. 1577, no. 1, July 2020, doi:10.1088/1742-6596/1577/1/012019.

H. Amanullah, S. Letchmunan, M Zia, U. Butt, H. Fadratul, “Analysis of Deep Neural Networks for Human Activity Recognition in Videos – A Systematic Literature Reviewâ€, IEEE Access, vol. 99, pp 1-1, 2021, doi: 10.1109/ACCESS.2021.3110610.

R. Mondal, D. Mukherjee, P. K. Singh, V. Bhateja and R. Sarkar, "A New Framework for Smartphone Sensor-Based Human Activity Recognition Using Graph Neural Network," in IEEE Sensors Journal, vol. 21, no. 10, pp. 11461-11468, 15 May, 2021, doi: 10.1109/JSEN.2020.3015726.

M. Bendali-Braham, J. Weber, G. Forestier, Lhassane Idoumghar, P Alain Muller, “Recent trends in crowd analysis: A reviewâ€, Machine Learning with Applications, vol. 4, June 2021, 100023, ISSN 2666-8270, doi:10.1016/ jmlwa.2021.100023.

M. R. Bhuiyan., J. Abdullah, N. Hashim, F. Farid, “Video analytics using deep learning for crowd analysis: a reviewâ€, Journal Multimedia Tools Applications, vol. 81, pp. 27895-27922, March 2022, doi:10.1007/s11042-022-12833.

S. Bhalla, K. Singh, "Exploration of Crime Detection Using Deep Learning", Innovations in Cyber-Physical Systems. Lecture Notes in Electrical Engineering, vol. 788, pp. 297-304, September 2021.

A. Hayat, F. Morgado-Dias, B.P. Bhuyan, R. Tomar, “Human Activity Recognition for Elderly People Using Machine and Deep Learning Approachesâ€, MDPI Journal Information, vol. 13, issue 6, pp. 275, 2022 doi:10.3390/info13060275.

C. Jobanputra, J. Bavishi, N. Doshi, “Human Activity Recognition: A Survey, Procedia Computer Science, vol. 155, pp. 698-703, 2019, ISSN 1877-0509, doi: 10.1016/j.procs.2019.08.100.

A. M. F and S. Singh, "Computer Vision-based Survey on Human Activity Recognition System, Challenges and Applications," Proc 3rd International Conference on Signal Processing and Communication (ICPSC), pp. 110-114, 2021, doi: 10.1109/ICSPC51351.2021.9451736.

S. S. Yadav, S.M. Jadhav, “Deep convolutional neural network based medical image classification for disease diagnosisâ€, Journal of Big Data, Vol.6, 113, December 2019, doi:10.1186/s40537-019-0276-2.

A. Ullah, M. Khan, W. Ding, V. Palade, Ijaz Ul Haq, S. W. Baik, “Efficient activity recognition using lightweight CNN and DS-GRU network for surveillance applications,†Applied Soft Computing, vol. 103, 107102, May 2021, 21.107102.

C. Wolf, J. Mille, E. Lombardi, O. Celiktutan, M. Jiu, E. Dogan, G. Eren, M. Baccouche, E. Dellandrea, C. E. Bichot, C. Garcia, B. Sankur, “Evaluation of video activity localizations integrating quality and quantity measurementsâ€, Computer Vision and Image Understanding, vol. 127, pp.14-30, October 2014.

B. Jagadeesh, & C M Patil, “Video Based Human Activity Detection, Recognition and Classification of actions using SVMâ€, Transactions on Machine Learning and Artificial Intelligence, vol. 6, no. 6, January 2019.

A. Deshpande, K. K. Warhade, “An Improved Model for Human Activity Recognition by Integrated feature Approach and Optimized SVMâ€, Proc. International Conference on Emerging Smart Computing and Informatics (ESCI), April 2021, pp. 571-576.

A. Agarwal, A. Sharma, A. Gupta, V. Goel, “Human Movement Recognition System using Râ€, International Journal of Engineering and Advanced Technology (IJEAT) ISSN: 2249-8958, vol. 8, Issue 5, pp. 560-566, June 2019.

M. F Abdul, S. Singh, “Computer Vision-based Survey on Human Activity Recognition Systemâ€, Challenges and Applications, Proc 3rd International Conference on Signal Processing and Communication, 2021, pp.110-114.

D. R Beddiar., Nini B., Sabokrou M., “Vision-based human activity recognition: A surveyâ€, Multimedia Tools and Applications, vol. 79, pp. 30509–30555, August 2020,

H-B Zhang, Y-X Zhang, B. Zhong, Qing Lei, L. Yang, Ji-Xiang Du, and D-S Chen. “A Comprehensive Survey of Vision-Based Human Action Recognition Methodsâ€, Sensors, vol. 19 no. 5, 1005, February 2019.

Sarnaik, Neha, “Human Activity Recognition using CNNâ€, International Journal of Scientific and Research Publications (IJSRP), vol 10, issue 2, February 2020, pp 9804, doi:10.29322/IJSRP.10.02.2020.

N. Junagade and S. Kulkarni, "Human Activity Identification using CNN," Proc Fourth International Conference on I-SMAC (IoT in Social, Mobile, Analytics, and Cloud) (I-SMAC), 2020, pp. 1058-1062, doi: 10.1109/I-SMAC49090.2020.9243477.

K. Wang, Xuejing Li, Jianhua Yang, Jun Wu, Ruifeng Li, "Temporal action detection based on two-stream You Only Look Once network for elderly care service robot "International Journal of Advanced Robotic Systems, vol 18, issue 4, July 2021.

N. Almaadeed, O. Elharrouss, S. Al-Maadeed, A. Bouridane, A. Beghdadi “A novel approach for robust multi-human action recognition and summarization based on 3D convolutional neural networksâ€, March 2021, doi:10.48550/arXiv.1907.11272.

H. Hamdy Ali, H. M. Moftah, A. Youssif, “Depth-based human activity recognition: A comparative perspective study on feature extractionâ€, Future Computing and Informatics Journal, vol. 3, issue 1, pp 51-68, 2018,

C. Liu, Y. J. Yang, H. Haima, Yang X., Hu J. L., "Improved human action recognition approach based on two-stream convolutional neural network model", The Visual Computer, vol. 37, pp. 1327–1341, June 2021.

D. Arifoglu, A. Bouchachia, “Activity recognition and abnormal behavior detection with recurrent neural networksâ€, Procedia Computer Science, vol. 110, pp.86–93, 2017.

S. Chakraborty, R. Mondal, P. K. Singh, R. Sarkar, and D. Bhattacharjee, “Transfer learning with fine tuning for human action recognition from still imagesâ€, Multimedia Tools Applications 80, vol. 13, pp. 20547–20578, May 2021, doi:10.1007/s11042-021-10753-y.

Oh S., Ashiquzzama A., Lee D., Kim Y., Kim J., “Study on Human Activity Recognition Using Semi-Supervised Active Transfer Learningâ€, Sensors, Basel, Switzerland, vol. 21, no. 8, April 2021, doi:10.3390/s21082760.

P. M. Jadhav, S. Begampure, “Intelligent video analytics for human action detection: a deep learning approach with transfer learningâ€, International Journal of Computing and Digital Systems, vol .11 no.1, pp. 64–71, July 2021, doi:10.12785/ijcds/110105.

S. Shinde, A. Kothari, G. V, "YOLO based human action recognition and localization", Procedia Computer Sci, vol. 133, pp. 831–838, 2018.

J. Zicong, L. Zhao, S. Li, and Y. Jia, “Real-time object detection method based on improved YOLOv4-tiny†Journal of Network Intelligence, vol. 7, no.1, February 2022, doi:10.48550/arXiv.2011.04244 .

C. Schuldt, I. Laptev, and B. Caputo, "Recognizing human actions: a local SVM approach," Proceedings of the 17th International Conference on Pattern Recognition 2004, September 2004, vol.3, pp. 32-36, doi: 10.1109/ICPR.2004.1334462.

L. Zelnik-Manor, &M. Irani, “Event-based analysis of videoâ€, Proc IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR, December 2001, vol. 2, pp. II-II.

K. Soomro, A.R Zamir, and M. Shah, “UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild†CRCV-TR-12-01,2012, abs/1212.0402. doi:

J. Liu, J. Luo, and M. Shah, “Recognizing Realistic Actions from Videos in the Wild", IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), August 2009, pp. 1996-2003, doi: 10.1109/CVPR.2009.5206744.

G. Chunhui, C. Sun, D. A. Ross, C. Vondrick, C. Pantofaru, et al. "Ava: A video dataset of spatiotemporal localized atomic visual actions", Proceedings IEEE Conference on Computer Vision and Pattern Recognition, December 2018, pp 6047-6056, doi: 10.1109/CVPR.2018.00633.

P. Barmpoutis, T. Stathaki, and S. Camarinopoulos, "Skeleton-Based Human Action Recognition through Third-Order Tensor Representation and Spatio-Temporal Analysis", Inventions, vol. 4, no.9, February 2019.

H. Hendry, Rung-Ching Chen, “Automatic License Plate Recognition via sliding-window darknet- YOLO deep learningâ€, Image and Vision Computing, vol. 87, pp. 47-56, ISSN 0262-885, July 2019, doi: 10.1016/j.imavis.2019.04.007.

A. Bochkovskiy, C. Y. Wang, H. Yuan, M. Liao., “YOLOv4: optimal speed and accuracy of object detectionâ€, April 2020, DOI:

G. Malik, Muhammad H., Yousaf, Shah Nawaz, Zakaur Rehman, Hyung Won Kim, “Patient Monitoring by Abnormal Human Activity Recognition Based on CNN Architectureâ€, Electronics, vol 9, no. 12, November 2020, doi:10.3390/electronics9121993

W. Mmereki, R. S. Jamisola, D, T. Mpoeleng, Petso, "YOLOv3-Based Human Activity Recognition as Viewed from a Moving High-Altitude Aerial Camera", Proc 7th International Conference on Automation, Robotics and Applications (ICARA), Feb 2021, pp. 241–246



  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development