Human-Robot Interaction Based on Dialog Management Using Sentence Similarity Comparison Method

Dinda Ayu Permatasari, Hanif Fakhrurroja, Carmadi Machbub


Advances in developing dialogue systems regarding speech recognition, language understanding, and speech synthesis. Dialogue systems to support human interaction with a robot efficiently by using spoken language. Facilities that provide convenience in carrying out daily activities for someone, such as older people, are necessary. The existence of Human-Robot Interaction (HRI), so that this interaction can give orders to the robot to do work that cannot be done by humans. This study presents a dialogue management system for HRI with a comparison sentence similarity method between TF-IDF (Term Frequency-Inverse Document Frequency) Cosine Similarity Algorithm and Jaccard Coefficient and using Finite State Machine (FSM). Dialogue Management is a way to find the response of the answer. When the user says something or in other words, is responsible for managing the flow of the conversation to command the robot. TF-IDF is used to give the weight of the term relationship and comparison between Cosine Similarity and Jaccard Coefficient for comparison method to determine the classification of similarity sentences from the dialogue manager to improve the intent of the dialogue, for the FSM method to set the sequence flow dialogue. We use Google Cloud Speech API as an engine for speech to text using Kinect V2 as an audio sensor. There are eight scenarios created in this system. The speech recognition process using Google Speech for an average of 2.62 seconds shows a reasonably fast response. TF-IDF Cosine Similarity method can produce enough accuracy of 97.43%, and Jaccard Coefficient indicates an accuracy level of 91.57%. The state of the FSM method can be considered as an efficient structure for building dialogue management. 


dialogue manager; TF-IDF; cosine similarity; finite state machine; human-robot interaction; Google cloud speech.

Full Text:



T. H. Bui, “Multimodal Dialogue Management - State of the art,†2006.

D. A. Maharani, H. Fakhrurroja, Riyanto, and C. Machbub, “Hand gesture recognition using K-means clustering and Support Vector Machine,†in ISCAIE 2018 - 2018 IEEE Symposium on Computer Applications and Industrial Electronics, 2018, pp. 1–6.

L. Meng and M. Huang, “Dialogue intent classification with long short-term memory networks,†Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 10619 LNAI, pp. 42–50, 2018.

M. C. Hsieh, W. S. Hung, S. W. Lin, and C. H. Luo, “Designing an assistive dialog agent for a case of spinal cord injury,†in Proceedings - 2009 9th International Conference on Hybrid Intelligent Systems, HIS 2009, 2009.

K. Sadohara et al., “Sub-lexical Dialogue Act Classification in a Spoken Dialogue System Support for the Elderly with Cognitive Disabilities,†Proc. Fourth Work. Speech Lang. Process. Assist. Technol., pp. 93–98, 2013.

S. Schwarzler, J. Schenk, G. Ruske, and F. Wallhoff, “A multi-agent framework for a hybrid dialog management system,†in 2009 IEEE International Conference on Multimedia and Expo, 2009, pp. 958–961.

H. Holzapfel, “A dialogue manager for multimodal human-robot interaction and learning of a humanoid robot,†Ind. Rob., vol. 35, no. 6, pp. 528–535, 2008.

C. Lee, Y. S. Cha, and T. Y. Kuc, “Implementation of dialogue system for intelligent service robots,†in 2008 International Conference on Control, Automation and Systems, ICCAS 2008, 2008.

A. Raux and M. Eskenazi, “A Finite-State Turn-Taking Model for Spoken Dialog Systems,†in Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, 2009.

S. Yi and K. Jung, “A Chatbot by Combining Finite State Machine, Information Retrieval, and Bot-Initiative Strategy,†Alexa Price Proc., pp. 1–10, 2017.

B. Su, T. Kuan, S. Tseng, J. Wang, and P. Su, “Improved TF-IDF weight method based on sentence similarity for spoken dialogue system,†in 2016 International Conference on Orange Technologies (ICOT), 2016, pp. 36–39.

C. Lee, S. Jung, S. Kim, and G. G. Lee, “Example-based dialog modeling for practical multi-domain dialog system,†Speech Commun., 2009.

H. Fakhrurroja, D. A. Permatasari, A. Purwarianti, and C. Machbub, “Dialogue Management for Human Robot Interaction Using Artificial Intelligence Markup Language,†in ICEECS 2018 International Conference on Electrical Engineering and Computer Science, 2018.

X. Zhang and Y. LeCun, “Character-level Convolutional Networks for Text Classificatio,†in Advances in Neural Information Processing Systems 28, 2015.

D. Petcu, C. Craciun, and M. Rak, “Towards a Cross Platform Cloud API - Components for Cloud Federation,†in CLOSER, 2011.

Google, “Google Speech API,†Google Cloud Platform, 2017.

M. Assefi, G. Liu, M. P. Wittie, and C. Izurieta, “An Experimental Evaluation of Apple Siri and Google Speech Recognition,†Proccedings 2015 ISCA SEDE, 2015.

C. C. Chiu et al., “State-of-the-Art Speech Recognition with Sequence-to-Sequence Models,†in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 2018.

C. D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval. Cambridge University Press, 2009.

N. Agarwal, M. Rawat, and M. Vijay, “Comparative Analysis Of Jaccard Coefficient and Cosine Similarity for Web Document Similarity Measure,†Int. J. Adv. Res. Eng. Technol., 2014.

H. Gomaa, Real-Time Software Design For Embedded Systems. New York: Cambridge University Press, 2016.



  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development