Classification of Indonesian Population's Level Happiness on Twitter Data Using N-Gram, Naïve Bayes, and Big Data Technology

I Nyoman Krisna Bayu, I Made Agus Dwi Suarjaya, Putu Wira Buana


The level of happiness is one factor that influences social interaction in the community. Therefore, the population's happiness level within the current year has become an exciting concern to be studied. Since last year, the world has been facing a COVID-19 pandemic. COVID-19 pandemic dramatically affects the happiness level of the population from a social, economic, health, education, and tourism perspective. The various affected sectors cause different levels of emotional happiness in the community in terms of social interactions in opinions and issues on social media. In addition, the number of issues on social media induce a vast data warehouse and high complexity. Big Data is a science that handles large amounts of data, which is unmanageable using traditional data processing methods or techniques. Various companies, organizations, researchers, and academics practice Big Data to extract and analyze the necessary information. Big Data is a general term used for all data collection forms of vast and complex nature. The utilization of Big Data can be valuable for a better decision-making process. This study uses Big Data Technology to evaluate the Indonesian population's happiness level on Twitter data. Method classified and technique using the N-Gram, Naïve Bayes, and Laplacian Smoothing Technique. The emotion in this research is classified into two aspects: happy and unhappy emotions. A total of 4.306.581 tweet data is classified; the obtained results revealed 39,4% happy emotion and 60,6% unhappy emotion.


Happiness level; tweet; big data; n-gram; naïve bayes; laplacian smoothing.

Full Text:



A. H. Goldman, “Happiness is an emotion,†J. Ethics, vol. 21, no. 1, pp. 1–16, Mar. 2017, DOI: 10.1007/s10892-016-9240-y.

M. Holmes and J. McKenzie, “Relational happiness through recognition and redistribution: Emotion and inequality,†Eur. J. Soc. Theory, vol. 22, no. 4, pp. 439–457, Sep. 2018, DOI: 10.1177/1368431018799257.

V. Cauberghe et al., “How adolescents use social media to cope with feelings of loneliness and anxiety during covid-19 lockdown,†Cyberpsychology, Behav. Soc. Netw., vol. 24, no. 4, pp. 250–257, Apr. 2021, DOI: 10.1089/cyber.2020.0478.

T. Greyling, S. Rossouw, and T. Adhikari, “A tale of three countries: what is the relationship between covid-19, lockdown and happiness?,†South African J. Econ., vol. 89, no. 1, pp. 25–43, Feb. 2021, DOI: 10.1111/saje.12284.

M. A. Memon et al., “Big data analytics and its applications,†Ann. Emerg. Technol. Comput., vol. 1, no. 1, pp. 45–54, Oct. 2017, DOI: 10.33166/AETiC.2017.01.006.

U. Sivarajah et al., “Critical analysis of big data challenges and analytical methods,†J. Bus. Res., vol. 70, pp. 263–286, Jan. 2017, DOI: 10.1016/j.jbusres.2016.08.001.

S. A. El-Seoud et al., “Big data and cloud computing: Trends and challenges,†Int. J. Interact. Mob. Technol., vol. 11, no. 2, pp. 34–52, 2017, DOI: 10.3991/ijim.v11i2.6561.

F. Atefeh and W. Khreich, “A survey of techniques for event detection in Twitter,†Comput. Intell., vol. 31, no. 1, pp. 133–164, Feb. 2015, DOI: 10.1111/coin.12017.

E. Georgiadou, S. Angelopoulos, and H. Drake, “Big data analytics and international negotiations: Sentiment analysis of Brexit negotiating outcomes,†Int. J. Inf. Manage., vol. 51, no. Oct. 2019, p. 102048, 2020, DOI: 10.1016/j.ijinfomgt.2019.102048.

M. Valera and Y. Patel, “A peculiar sentiment analysis advancement in big data,†in Journal of Physics: Conference Series, 2018, vol. 933, no. 1, DOI: 10.1088/1742-6596/933/1/012015.

A. R. Susanti, T. Djatna, and W. A. Kusuma, “Twitter’s sentiment analysis on gsm services using multinomial naïve bayes,†Telkomnika (Telecommunication Comput. Electron. Control., vol. 15, no. 3, pp. 1354–1361, Sep. 2017, DOI: 10.12928/TELKOMNIKA.v15i3.4284.

M. A. Burhanuddin et al., “Analysis of mobile service providers performance using naive bayes data mining technique,†Int. J. Electr. Comput. Eng., vol. 8, no. 6, p. 5153, Dec. 2018, DOI: 10.11591/ijece.v8i6.pp5153-5161.

Y. Vernanda, M. B. Kristanda, and S. Hansun, “Indonesian language e-mail spam detection using n-gram and naïve bayes algorithm,†Bull. Electr. Eng. Informatics, vol. 9, no. 5, pp. 2012–2019, Oct. 2020, DOI: 10.11591/eei.v9i5.2444.

M. Z. H. Jesmeen et al., “A survey on cleaning dirty data using machine learning paradigm for big data analytics,†Indones. J. Electr. Eng. Comput. Sci., vol. 10, no. 3, pp. 1234–1243, Jun. 2018, DOI: 10.11591/ijeecs.v10.i3.pp1234-1243.

M. K. Yusof and M. Man, “Efficiency of json for data retrieval in big data,†Indones. J. Electr. Eng. Comput. Sci., vol. 7, no. 1, pp. 250–262, Jul. 2017, DOI: 10.11591/ijeecs.v7.i1.pp250-262.

B. Wilie et al., “IndoNLU: Benchmark and resources for evaluating indonesian natural language understanding,†arXiv, Oct. 2020.

D. H. Wahid and A. SN, “Peringkasan sentimen esktraktif di twitter menggunakan hybrid tf-idf dan cosine similarity,†IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 10, no. 2, p. 207, Jul. 2016, DOI: 10.22146/ijccs.16625.

P. K. Novak et al., “Sentiment of emojis,†PLoS One, vol. 10, no. 12, pp. 1–22, Dec. 2015, DOI: 10.1371/journal.pone.0144296.

P. Jennifer and A. Muthukumaravel, “A study on stopwords, stemming and text mining,†Eur. J. Mol. Clin. Med., vol. 7, no. 6, pp. 1675–1682, 2020.

T. Kudo, “A Proposal of transaction processing method for mongodb,†Procedia Comput. Sci., vol. 96, pp. 801–810, 2016, DOI: 10.1016/j.procs.2016.08.251.

M. Schonlau, N. Guenther, and I. Sucholutsky, “Text mining with n-gram variables,†Stata J., vol. 17, no. 4, pp. 866–881, Jan. 2018, DOI: 10.1177/1536867X1801700406.

D. Gamal et al., “Implementation of machine learning algorithms in arabic sentiment analysis using n-gram features,†in Procedia Computer Science, 2018, vol. 154, pp. 332–340, DOI: 10.1016/j.procs.2019.06.048.

J. Song et al., “A novel classification approach based on naïve bayes for twitter sentiment analysis,†KSII Trans. Internet Inf. Syst., vol. 11, no. 6, pp. 2996–3011, Jun. 2017, DOI: 10.3837/tiis.2017.06.011.

V. D. Chaithra, “Hybrid approach: naive bayes and sentiment vader for analyzing sentiment of mobile unboxing video comments,†Int. J. Electr. Comput. Eng., vol. 9, no. 5, pp. 4452–4459, Oct. 2019, DOI: 10.11591/ijece.v9i5.pp4452-459.

S. Batt et al., “Learning tableau: a data visualization tool,†J. Econ. Educ., vol. 51, no. 3–4, pp. 317–328, Aug. 2020, DOI: 10.1080/00220485.2020.1804503.



  • There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development