Local Trajectory Occurrence Patterns for Partial Action and Gesture Recognition

Gustavo Garzon, Fabio Martinez


Action and gesture recognition is essential in computer vision because of their multiple and potential applications. Nowadays, in the literature, dramatic advances have been reported regarding recognizing gestures and actions under uncontrolled scenarios with significant appearance and motion variations. Nevertheless, much of these approaches still require manual segmentation of temporal action boundaries and complete processing of whole sequences to obtain a prediction. This work introduces a novel motion description that can recognize actions and gestures over partial sequences. The approach starts by representing video sequences as a set of key-point trajectories. Such trajectories are then hierarchically represented from a local and regional perspective, following a statistical counting process. Firstly, each trajectory is defined as a binary occurrence pattern that allows for standing out critical motions by neighborhood densities from a local perspective. Such occurrence patterns are involved in a regional bag-of-words representation of actions. Both representations could be obtained for any interval inside the video, achieving a partial recognition of motion, and regional representation is mapped to a support vector machine to obtain a prediction. The proposed approach was evaluated on academic action recognition datasets and a large gesture dataset used for sign recognition. Regarding partial video sequence recognition, the proposed approach achieves an accuracy rate of 63% using only 20% of frames. The proposed strategy achieved a very compact description, with only 400 scalar values, which ideal for online applications.


Action recognition; binary motion patterns; occurrence patterns; motion trajectories.

Full Text:



Mahmood, A., Al-Maadeed, S. “Action recognition in poor-quality spectator crowd videos using head distribution-based person segmentation†Machine Vision and Applications, 30(6), pp. 1083-1096 (2019).

Sultani, W., Shah, M. “Automatic action annotation in weakly labeled videosâ€. Computer Vision and Image Understanding, 161, pp. 77-86 (2017).

Saravanan, D. “Efficient Video Indexing and Retrieval Using Hierarchical Clustering Techniqueâ€. In Proceedings of the Second International Conference on Computational Intelligence and Informatics. pp. 1-8. Springer (2018).

Kong, L., Huang, D., Qin, J., Wang, Y. “A joint framework for athlete tracking and action recognition in sports videosâ€. IEEE Transactions on Circuits and Systems for Video Technology, 30(2), pp. 532-548 (2019).

Narayana, P., Beveridge, R., Draper, B. A. “Gesture recognition: Focus on the handsâ€. In Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 5235-5244 (2018).

G. Zhu, L. Zhang, P. Shen and J. Song, “An online continuous human action recognition algorithm based on the kinect sensor†Sensors, Multidisciplinary Digital Publishing Institute., vol. 16, 2016.

Zhang, H. B., Zhang, Y. X., Zhong, B., Lei, Q., Yang, L., Du, J. X., Chen, D. S. “A comprehensive survey of vision-based human action recognition methodsâ€. Sensors, 19(5), p. 1005 (2019).

V. Veeriah, N. Zhuang, and Qi, Guo-Jun, “Differential recurrent neural networks for action recognition†Proceedings of the IEEE international conference on computer vision, IEEE., pp. 4041-4049, 2015.

S. Al-Ali, M. Milanova, H. Al-Rizzo and V. Fox, “Human action recognition: contour-based and Silhouette-based approaches†Computer Vision in Control Systems-2, Springer., pp. 11-47, 2015.

I. Laptev and T. Lindeberg, “Local descriptors for spatio-temporal recognition†Lecture notes in computer science, Springer., vol. 3667, pp. 91-103, 2006.

W. Moreno, G. Garzon and F. Martínez, “Frame-Level Covariance Descriptor for Action Recognition†Colombian Conference on Computing, Springer., pp. 276-290, 2018.

J. Rodriguez and F. Martínez, “A Kinematic Gesture Representation Based on Shape Difference VLAD for Sign Language Recognition†International Conference on Computer Vision and Graphics, Springer., pp. 438-449, 2018.

H. Wang, A. Klaser, C. Schmid and C. Liu “Action recognition by dense trajectories†Computer Vision and Pattern Recognition (CVPR), IEEE., pp. 3169-3176, 2011.

H. Wang and C. Schmid “Action recognition with improved trajectories†Proceedings of the IEEE international conference on computer vision, IEEE., pp. 3551-3558, 2013.

H. Wang, A. Klaser, C. Schmid and C. Liu “Dense trajectories and motion boundary descriptors for action recognition†International journal of computer vision, Springer US., vol. 103, pp. 60-79, 2013.

F. Caba, V. Escorcia, B. Ghanem and J. Niebles “ActivityNet: A Large-Scale Video Benchmark for Human Activity Understanding†Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE., vol. 103, pp. 961-970, 2015.

H. Rahmani, A. Mian and M. Shah “Learning a deep model for human action recognition from novel viewpoints†IEEE transactions on pattern analysis and machine intelligence, IEEE., vol. 40, pp. 667-681, 2018.

G. Varol, I. Laptev and C. Schmid “Long-term temporal convolutions for action recognition†IEEE transactions on pattern analysis and machine intelligence, IEEE., vol. 40, pp. 1510-1517, 2017.

Q. Ke, M. Fritz and B. Schiele “Time-Conditioned Action Anticipation in One Shot†Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE., pp. 9925-9934, 2019.

G. Garzon and F. Martínez “Online action recognition from trajectory occurrence binary patterns (ToBPs)†Proceedings of the International Conference on Advances in Emerging Trends and Technologies, Springer., 2019.

T. Ojala, M. Pietikainen and D. Harwood “A comparative study of texture measures with classification based on featured distributions†Pattern recognition, Elsevier., vol. 29, pp. 51-59, 1996.

T. Bouwmans, C. Silva, C. Marghes, M. Zitouni, S. Mohammed, H. Bhaskar and C. Frelicot “On the role and the importance of features for background modeling and foreground detection†Computer Science Review, Elsevier., vol. 28, pp. 26-91, 2018.

L. Nanni, S. Brahnam and A. Lumini “Local ternary patterns from three orthogonal planes for human action classification†Expert Systems with Applications, Elsevier., vol. 38, pp. 5125-5128, 2011.

L. Yeffet and L. Wolf “Local trinary patterns for human action recognition†12th International Conference on Computer Vision, IEEE., pp. 492-497, 2009.

T. Nguyen, A. Manzanera, N. Vu and M. Garrigues “Revisiting lbp-based texture models for human action recognition†Iberoamerican Congress on Pattern Recognition, Springer., pp. 286-293, 2013.

R. Anwer, F. Khan, J. van de Weijer, M. Molinier and J. Laaksonen “Binary patterns encoded convolutional neural networks for texture recognition and remote sensing scene classification†ISPRS journal of photogrammetry and remote sensing, Elsevier., vol. 138, pp. 74-85, 2018.

R. Muhammad Anwer, F. Khan, J. van de Weijer and J. Laaksonen “Tex-nets: Binary patterns encoded convolutional neural networks for texture recognition†Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, ACM., pp. 125-132, 2017.

C. Chang and C. Lin “LIBSVM: a library for support vector machines†ACM transactions on intelligent systems and technology (TIST), ACM., vol. 2, p. 27, 2011.

C. Schuldt, I. Laptev and B. Caputo “Recognizing human actions: a local SVM approach†Proceedings of the 17th International Conference on Pattern Recognition, IEEE., vol. 3, pp. 32-36, 2004.

L. Gorelick, M. Blank, E. Shechtman, M. Irani and R. Basri “Actions as space-time shapes†IEEE transactions on pattern analysis and machine intelligence, IEEE., vol. 29, pp. 2247-2253, 2007.

M. Ryoo and J. Aggarwal “Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities†12th international conference on Computer vision, IEEE., pp. 1593-1600, 2009.

F. Ronchetti, F. Quiroga, C. Estrebou, L. Lanzarini and A. Rosete “LSA64: an Argentinian sign language dataset†XXII Congreso Argentino de Ciencias de la Computación (CACIC 2016), 2016.

DOI: http://dx.doi.org/10.18517/ijaseit.11.1.9286


Published by INSIGHT - Indonesian Society for Knowledge and Human Development