Emotion Recognition on Facial Expression and Voice: Analysis and Discussion

Kok-Why Ng, Yixen Lim, Su-Cheng Haw, Yih-Jian Yoong


Emotion plays an important role in our daily lives. Emotional individuals can affect the performance of a company, the harmony of a family, the wellness or growth (physical, mental, and spiritual) of a child etc. It renders a wide range of impacts. The existing works on emotion detection from facial expressions differ from the voice. It is deduced that the facial expression is captured on the face externally, whereas the voice is captured from the air passes through the vocal folds internally. Both captured output models may very much deviate from each other. This paper studies and analyses a person's emotion through dual models -- facial expression and voice separately. The proposed algorithm uses a Convolutional Neural Network (CNN) with 2-dimensions convolutional layers for facial expression and 1-Dimension convolutional layers for voice. Feature extraction is done via face detection, and Mel-Spectrogram extraction is done via voice. The network layers are fine-tuned to achieve the higher performance of the CNN model. The trained CNN models can recognize emotions from the input videos, which may cover single or multiple emotions from the facial expression and voice perspective. The experimented videos are clean from the background music and environment noise and contain only a person's voice. The proposed algorithm achieved an accuracy of 62.9% through facial expression and 82.3% through voice.


Emotion recognition; facial expression; voice; convolutional neural network; Mel-spectrogram

Full Text:



Lim, Y., Ng, K. W., Naveen, P., & Haw, S. C., "Emotion Recognition by Facial Expression and Voice: Review and Analysis," Journal of Informatics and Web Engineering (JIWE), 1(2), pp. 45-54, 2022. https://doi.org/10.33093/jiwe.2022.1.2.4

Naga, P., Marri, S. D., & Borreo, R., "Facial emotion recognition methods, datasets and technologies: A literature survey," Materials Today: Proceedings, 80(1), pp. 2824-2828, 2023. DOI:10.1016/j.matpr.2021.07.046

Park, C. L., Kubzansky, L. D., Chafouleas, S. M., Davidson, R. J., Keltner, D., Parsafar, P., ... & Wang, K. H., "Emotional well-being: What it is and why it matters," Affective Science, 4(1), pp. 10-20, 2023. https://doi.org/10.1007/s42761-022-00163-0

Anaam, E. A., Haw, S. C., Ng, K. W., Naveen, P., & Thabit, R., "Utilizing Fuzzy Algorithm for Understanding Emotional Intelligence on Individual Feedback," Journal of Informatics and Web Engineering (JIWE), 2(2), pp. 273-283, 2023. https://doi.org/10.33093/jiwe.2023.2.2.19

Lahat, L., & Ofek, D., "Emotional well-being among public employees: A comparative perspective," Review of Public Personnel Administration, 42(1), pp. 31-59, 2022. https://doi.org/10.1177/0734371X20939642

Leong, S. C., Tang, Y. M., Lai, C. H., & Lee, C. K. M., "Facial expression and body gesture emotion recognition: A systematic review on the use of visual data in affective computing," Computer Science Review, 48, 100545, 2023. https://doi.org/10.1016/j.cosrev.2023.100545

Cai, Y., Li, X., & Li, J., "Emotion Recognition Using Different Sensors, Emotion Models, Methods and Datasets: A Comprehensive Review," Sensors, 23(5), 2455, 2023. https://doi.org/10.3390/s23052455

Lam, X. H., Ng, K. W., Yoong, Y. J., & Ng, S. B., "WBC-based segmentation and classification on microscopic images: a minor improvement," F1000Research, 10, 1168, 2021. doi: 10.12688/f1000research.73315.1

Ang, J. S., Ng, K., & Chua, F. F., "Stock market prediction using deep learning approach," Journal of Engineering Science and Technology., 17(5), pp. 3174-3186, 2022. https://jestec.taylors.edu.my/Vol%2017%20Issue%205%20October%202022/17_5_12.pdf

Sarvakar, K., Senkamalavalli, R., Raghavendra, S., Kumar, J. S., Manjunath, R., & Jaiswal, S., "Facial emotion recognition using convolutional neural networks," Materials Today: Proceedings, 80, pp. 3560-3564, 2023. https://doi.org/10.1016/j.matpr.2021.07.297

Agrawal, A., & Mittal, ·. N., "Using CNN for facial expression recognition: a study of the effects of kernel size and number of filters on accuracy," The Visual Computer 36 (2), pp. 405-412, 2020. doi:10.1007/s00371-019-01630-9

Singh, R., Saurav, S., Kumar, T., Saini, R., Vohra, A., & Singh, S., "Facial expression recognition in videos using hybrid CNN & ConvLSTM," International Journal of Information Technology, 15(4), pp. 1819-1830, 2023. https://doi.org/10.1007/s41870-023-01183-0

Chand, V., Chrisanthus, A., Thampi, A., Dayal, S., & Dhanup, S., "A Review on Various CNN-based Approaches for Facial Expression Recognition," In 2023 International Conference on Inventive Computation Technologies (ICICT), pp. 465-471, 2023. IEEE. https://doi.org/10.1109/ICICT57646.2023.10133947

De Ocampo, A. L. P., "Haar-CNN Cascade for Facial Expression Recognition," In 2023 International Electrical Engineering Congress (iEECON), pp. 89-92, 2023. IEEE. https://doi.org/10.1109/iEECON56657.2023.10126902

Nugraha, G. S., Darmawan, M. I., & Dwiyansaputra, R., "Comparison of CNN's Architecture GoogleNet, AlexNet, VGG-16, Lenet-5, Resnet-50 in Arabic Handwriting Pattern Recognition," Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 2023. https://doi.org/10.22219/kinetik.v8i2.1667

Nainwal, A., Sharma, G., Kansal, V., Bhatla, S., & Pant, B., "Comparative Study of VGG-13, AlexNet, MobileNet and Modified-DarkCovidNet for Chest X-Ray Classification," In 2023 10th International Conference on Computing for Sustainable Global Development (INDIACom), pp. 413-417, 2023. IEEE. https://ieeexplore.ieee.org/document/10112523

Agarwalla, N., Panda, D., & Modi, M. K., "Deep Learning using Restricted Boltzmann Machines," International Journal of Computer Science & Information Security, 7(3), pp. 1552-1556, 2016. https://ijcsit.com/docs/Volume%207/vol7issue3/ijcsit20160703110.pdf

Chengeta, K., & Viriri, S., "A Survey on Facial Recognition based on local directional and local binary patterns," Conference on Information Communications Technology and Society (ICTAS), 2018. doi:10.1109/ICTAS.2018.8368757

Isnanto, R. R., A. F., Eridani, D., & Cahyono, G. D., "Multi-Object Face Recognition Using Local Binary Pattern Histogram and Haar Cascade Classifier on Low-Resolution Images," International Journal of Engineering and Technology Innovation, vol. 11, no. 1, 2021, pp. 45-58, 2021. doi:10.46604/ijeti.2021.6174

Hussain, S. A., & Balushi, A. S., "A real time face emotion classification and recognition using deep learning model," Journal of Physics: Conference Series, Vol.1432, No.1, pp. 012087, 2020. doi:10.1088/1742-6596/1432/1/012087

Abdulrahman, M., & Eleyan, A., "Facial Expression Recognition Using Support Vector Machines," The 23nd Signal Processing and Communications Applications Conference (SIU), pp. 276-279, 2015. doi:10.1109/SIU.2015.7129813

Kerkeni, L., Serrestou, Y., Mbarki, M., Raoof, K., Mahjoub, M. A., & Cleder, C., “Automatic Speech Emotion Recognition Using Machine Learning,†Social media and machine learning. IntechOpen, 2019. doi:10.5772/intechopen.84856.

Mannar Mannan J, Srinivasan L, Maithili K, & Ramya C., "Human Emotion Recognize Using Convolutional Neural Network (CNN) and Mel Frequency Cepstral Coefficient (MFCC)," Seybold Report Journal, 18(4), pp. 49-61, 2023. https://seybold-report.com

Mishra, S. P., Warule, P., & Deb, S., "Deep Learning Based Emotion Classification Using Mel Frequency Magnitude Coefficient," In 2023 1st International Conference on Innovations in High Speed Communication and Signal Processing (IHCSP) (pp. 93-98). IEEE, 2023. https://doi.org/10.1109/IHCSP56702.2023.10127148

Rumagit, R. Y., Alexander, G., & Saputra, I. F., "Model Comparison in Speech Emotion Recognition for Indonesian Language," Procedia Computer Science, 179, pp. 789-797, 2021. doi:https://doi.org/10.1016/j.procs.2021.01.09

Fayek, H., Lech, M., & Cavedon, L., "Towards real-time speech emotion recognition using deep neural networks," The 9th international conference on signal processing and communication systems (ICSPCS), pp. 1-5, 2015. doi:10.1109/ICSPCS.2015.7391796

Sharafi, M., Yazdchi, M., & Rasti, J., "Audio-Visual Emotion Recognition Using K-Means Clustering and Spatio-Temporal CNN," In 2023 6th International Conference on Pattern Recognition and Image Analysis (IPRIA), pp. 1-6, 2023. IEEE. https://doi.org/10.1109/IPRIA59240.2023.10147192

Badshah, A. M., Ahmad, J., Rahim, N., & Baik, S. W., "Speech Emotion Recognition from Spectrograms with Deep Convolutional Neural Network," International conference on platform technology and service (PlatCon), pp. 1-5, 2017. doi:10.1109/PlatCon.2017.7883728

Kim, N. K., Lee, J., Ha, H. K., Lee, G. W., Lee, J. H., & Kim, H. K., "Speech emotion recognition based on multi-task learning using a convolutional neural network," The Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), pp. 704-707, 2017. doi:10.1109/APSIPA.2017.8282123

Lu, X., Kang, X., Nishide, S., & Ren, F., "Object detection based on SSD-ResNet," In 6th International Conference on Cloud Computing and Intelligence Systems (CCIS), pp. 89-92, 2019. IEEE. https://doi.org/10.1109/CCIS48116.2019.9073753

Mithbavkar, S., & Shah, M., "Recognition of Emotion in Indian Classical Dance Using EMG Signal," Int. J. Adv. Sci. Eng. Inf. Technol ( IJASEIT), 11(4), pp. 1336, 2021. DOI:10.18517/ijaseit.11.4.14034

DOI: http://dx.doi.org/10.18517/ijaseit.13.5.19023


  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development