Development of Rule-Based Feature Extraction in Multi-label Text Classification

Gugun Mediamer; - Adiwijaya; Said Al Faraby

doi:10.18517/ijaseit.9.4.8894

Development of Rule-Based Feature Extraction in Multi-label Text Classification

Gugun Mediamer, - Adiwijaya, Said Al Faraby

Abstract

Hadith is the second main guidelines after the Holy Quran in the Islamic religion, which was revealed through the Messenger of Allah. Today, Hadith can classified by more than one class such as advice class, prohibited, and information to facilitate readers of Hadith in filtering the appropriate classes for each Hadith of Rasulullah SAW. In the course of research, there are many kinds of data involved in a text classification study. Therefore, special handling that fit with the characteristics of certain data is required. This study investigates the handling of multi-label dataâ€”Hadith Bukhari in Indonesian translationâ€”focusing on feature extraction, feature weighted, and preprocessing methods. This study uses a rule-based feature extraction combined with several types of preprocessing along with three types of feature-weighted methods: TF-IDF, Word2vec, and Word2vec weighted with TF-IDF, the five preprocessing stages in this research: Case Folding, Tokenization, Remove Punctuation, Stopword Removal, and Stemming. From the 13 experiments conducted in this study consist of 2000 hadiths, it was found that the best performance for multi-label classification of Hadith data produced by the combination of the proposed rule-based feature extraction, Word2vec feature weighted method, and without using Stemming and Stopword Removal in the preprocessing phase. The Hamming Loss value obtained from this combination was 0.0623. The results show that our rule-based feature extraction method better than baseline method.

Keywords

multi-label classification; Bukhari Hadith; feature-weighted; tf-idf; word2vec; hamming loss.

Full Text:

PDF

References

M. N. Al-Kabi, H. A. Wahsheh, I. M. Alsmadi, and A. Mohâ€™d Ali Al-Akhras, â€œExtended Topical Classification of Hadith Arabic Text,â€ Int. J. Islam. Appl. Comput. Sci. Technol., vol. 3, no. 3, pp. 13â€“23, 2015.

S. Al Faraby, E. R. R. Jasin, A. Kusumaningrum, and others, â€œClassification of hadith into positive suggestion, negative suggestion, and information,â€ in Journal of Physics: Conference Series, 2018, vol. 971, no. 1, p. 12046.

D. Rahmawati and M. L. Khodra, â€œAutomatic multilabel classification for Indonesian news articles,â€ in Advanced Informatics: Concepts, Theory and Applications (ICAICTA), 2015 2nd International Conference on, 2015, pp. 1â€“6.

D. Rahmawati and M. L. Khodra, â€œWord2vec semantic representation in multilabel classification for Indonesian news article,â€ in Advanced Informatics: Concepts, Theory And Application (ICAICTA), 2016 International Conference On, 2016, pp. 1â€“6.

R. A. Pane, M. S. Mubarok, N. S. Huda, and others, â€œA Multi-Lable Classification on Topics of Quranic Verses in English Translation Using Multinomial Naive Bayes,â€ in 2018 6th International Conference on Information and Communication Technology (ICoICT), 2018, pp. 481â€“484.

A. M. K. Izzaty, M. S. Mubarok, N. S. Huda, and Adiwijaya, â€œA Multi-label Classification on Topics of Quranic Verses in English Translation Using Tree Augmented Naï¿½ve Bayes,â€ in 2018 6th International Conference on Information and Communication Technology (ICoICT), 2018.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, â€œEfficient estimation of word representations in vector space,â€ arXiv Prepr. arXiv1301.3781, 2013.

J. Lilleberg, Y. Zhu, and Y. Zhang, â€œSupport vector machines and word2vec for text classification with semantic features,â€ in Cognitive Informatics & Cognitive Computing (ICCI* CC), 2015 IEEE 14th International Conference on, 2015, pp. 136â€“140.

A. I. Pratiwi and others, â€œOn the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis,â€ Appl. Comput. Intell. Soft Comput., vol. 2018, 2018.

M. S. Mubarok, Adiwijaya, and M. D. Aldhi, â€œAspect-based sentiment analysis to review products using Na{"i}ve Bayes,â€ in AIP Conference Proceedings, 2017, vol. 1867, no. 1, p. 20060.

M. S. Sorower, â€œA literature survey on algorithms for multi-label learning,â€ Oregon State Univ. Corvallis, vol. 18, 2010.

Z. Hao and B. Liu, â€œA rule based feature selection approach for target classification in wireless sensor networks with sensitive data applications,â€ Int. J. Distrib. Sens. Networks, vol. 10, no. 4, p. 429651, 2014.

M.-L. Zhang, J. M. PeÃ±a, and V. Robles, â€œFeature selection for multi-label naive Bayes classification,â€ Inf. Sci. (Ny)., vol. 179, no. 19, pp. 3218â€“3229, 2009.

N. D. Patel and C. Chand, â€œSelecting Best Features Using Combined Approach in POS Tagging for Sentiment Analysis.â€ IJCSMC, 2014.

B. M. Badr and S. S. Fatima, â€œUsing skipgrams, bigrams, and part of speech features for sentiment classification of twitter messages,â€ in Proceedings of the 12th International Conference on Natural Language Processing, 2015, pp. 268â€“275.

Z. Su, H. Xu, D. Zhang, and Y. Xu, â€œChinese sentiment classification using a neural network tool — Word2vec,â€ in 2014 International Conference on Multisensor Fusion and Information Integration for Intelligent Systems (MFI), 2014, pp. 1â€“6.

B. Babic, N. Nesic, and Z. Miljkovic, â€œA review of automated feature recognition with rule-based pattern recognition,â€ Comput. Ind., vol. 59, pp. 321â€“337, 2008.

D. Fu, B. Zhou, and J. Hu, â€œImproving SVM based multi-label classification by using label relationship,â€ in Neural Networks (IJCNN), 2015 International Joint Conference on, 2015, pp. 1â€“6.

C. D. Manning, P. Raghavan, and H. Schutze, â€œIntroduction to Information Retrieval,â€ vol. 39, 2008.

A. Dinakaramani, F. Rashel, A. Luthfi, and R. Manurung, â€œDesigning an Indonesian part of speech tagset and manually tagged Indonesian corpus,â€ in Asian Language Processing (IALP), 2014 International Conference on, 2014, pp. 66â€“69

DOI: http://dx.doi.org/10.18517/ijaseit.9.4.8894

Refbacks

There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development

International Journal on Advanced Science, Engineering and Information Technology

Development of Rule-Based Feature Extraction in Multi-label Text Classification

Abstract

Keywords

Full Text:

References

Refbacks