Deep Learning-based Video Summarization

Myoungchan Seo, YoungJin Suh, Kyuman Jeong


With the development of communication technology, many different kinds of media transmission have become popular. Among various media, video is the most popular media these days. However, users need to spend much time watching the whole video content. Due to the characteristics of video media, many users tend to playback video content quickly or even stop watching in the middle. Some websites provide summary images by capturing only important frames of video content, which is called a video summary. Users can shorten the viewing time by only watching the summary results. In particular, it is highly useful because content such as news articles or speeches can be delivered and utilized quickly. Since video summarization is a labor-intensive task, there is an increasing demand for research on automation techniques. In this paper, an automated process to solve the temporary problem of existing video summary techniques is proposed. The proposed method improves the existing video summarization methods that have been performed manually through human labor by developing artificial intelligence technology that can effectively perform content delivery using video summary automation. In the preprocessing process, the information transfer unit is partitioned using optical flow. In the following process, CNN (Convolutional Neural Network) is used as an in-depth learning method for feature extraction. The results show the efficiency of the proposed algorithm, and some future work will be given in the end.


Deep learning; video summarization; scene extraction; convolutional neural network; optical flow.

Full Text:



T. Bhattacharjee, S. Saha, A. Konar, and A. K. Nagar, Static Video Summarization Using Artificial Bee Colony optimization, Computational Intelligence (SSCI) 2018 IEEE Symposium Series on, pp. 777-784, 2018.

W.-S. Chu, Y. Song, and A. Jaimes, Video Co-summarization: Video Summarization by Visual Co-occurrence, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3584-3592, 2015.

Y. Yuan, T. Mei, P. Cui, and W. Zhu, Video Summarization by Learning Deep Side Semantic Embedding, IEEE Circuits and Systems Society, 2017.

A. Tejero-de-Pablos, Y. Nakashima, T. Sato, N. Yokoya, M. Linna, and E. Rahtu, Summarization of User-Generated Sports Video by Using Deep Action Recognition Features, IEEE Transactions on Multimedia, vol. 20, no. 8, pp. 2000-2011, 2018.

K.-C. Ko, and Y.-W. Rhee, Video Segmentation using The Automated Threshold Decision Algorithm, Journal of the Korean Society of Computer Information, vol. 10, no. 6, pp. 65-75, 2005.

Y. Zhai, and M. Shah, Video Scene Segmentation using Markov Chain Monte Carlo, IEEE Transactions on Multimedia, vol. 8, no. 4, pp. 686-697, 2006.

X. Fan, X. Yang, Q. Ye, and Y. Yang, A Discriminative Dynamic Framework for Facial Expression Recognition in Video Sequences, Journal of Visual Communication and Image Representation, vol. 56, pp. 182-187, 2018.

M. Ren, R. Kiros, and R. S. Zemel, Exploring Models and Data for Image Question Answering, Advances in Neural Information Processing Systems, 2015.

Apple Support, Motion: Fade In/Fade Out, World Wide Web

R. Fielding, The Technique of Special Effects Cinematography, Focal Press, pp. 151-152, 1985.

Y. Li, M.-Y. Liu, X. Li, M.-H. Yang, and J. Kautz, A Closed-form Solution to Photorealistic Image Stylization, in the proceeding of ECCV 2018, pp. 469-483, 2018.

W. Wang, J. Shen, and H. Ling, A deep network solution for attention and aesthetics aware photo cropping, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 7, pp. 1531–1544, 2019.

P. Lu, H. Zhang, X. Peng, and X. Peng, Aesthetic guided deep regression network for image cropping, Signal Processing: Image Communication, vol. 77, pp. 1 – 10, 2019.

Y. Kao, R. He, and K. Huang, Deep aesthetic quality assessment with semantic information, IEEE Transactions on Image Processing, vol. 26, no. 3, pp. 1482–1495, 2017.

G. Guo, H. Wang, C. Shen, Y. Yan, and H. M. Liao, Automatic image cropping for visual aesthetic enhancement using deep neural networks and cascaded regression, IEEE Transactions on Multimedia, vol. 20, no. 8, pp. 2073–2085, 2018.

D. Li, H. Wu, J. Zhang, and K. Huang, A2-RL: Aesthetics aware reinforcement learning for image cropping, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 8193–8201.

Z. Wei, J. Zhang, X. Shen, Z. Lin, R. Mech, M. Hoai, and D. Samaras, Good view hunting: Learning photo composition from dense view pairs, in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2018.

M. B. Islam, W. Lai-Kuan, and W. Chee-Onn, A survey of aesthetics driven image recomposition, Multimedia Tools Appl., vol. 76, no. 7, pp.9517–9542, 2017.

H.-.J. Lee, K.-S. Hong, H. Kang, and S. Lee, Photo Aesthetics Analysis via DCNN Feature Encoding, IEEE Transactions on Multimedia, vol. 19, no. 8, pp. 1921-1932, 2017.

Y. Deng, C. C. Loy, and X. Tang, Image Aesthetic Assessment: An experimental survey, IEEE Signal Processing Magazine, vol. 34, no. 4, pp. 80-106, 2017.



  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development