Support Vector Machine Algorithm for SMS Spam Classification in The Telecommunication Industry

Nilam Nur Amir Sjarif, Yazriwati Yahya, Suriayati Chuprat, Nurul Huda Firdaus Mohd Azmi

Abstract


In recent years, we have withnessed a dramatic increment volume in the number of mobile users grows in telecommunication industry. However, this leads to drastic increase to the number of spam SMS messages. Short Message Service (SMS) is considered one of the widely used communication in telecommunication service. In reality, most of the users ignore the spam because of the lower rate of SMS and limited amount of spam classification tools. In this paper, we propose a Support Vector Machine (SVM) algorithm for SMS Spam Classification. Support Vector Machine is considered as the one of the most effective for data mining techniques. The propose algorithm have been evaluated using public dataset from UCI machine learning repository. The performance achieved is compared with other three data mining techniques such as Naïve Bayes, Multinominal Naïve Bayes and K-Nearest Neighbor with the different number of K= 1,3 and 5. Based on the measuring factors like higher accuracy, less processing time, highest kappa statistics, low error and the lowest false positive instance, it’s been identified that Support Vector Machines (SVM) outperforms better than other classifiers and it is the most accurate classifier to detect and label the spam messages with an average an accuracy is 98.9%. Comparing both the error parameter overall, the highest error has been found on the algorithm KNN with K=3 and K=5. Whereas the model with less error is SVM followed by Multinominal Naïve Bayes. Therefore, this propose method can be used as a best baseline for further comparison based on SMS spam classification.


Keywords


short message service; spam; classification; data mining; support vector machine.

Full Text:

PDF

References


T. a Almeida, J. María, G. Hidalgo, and T. P. Silva, “Towards SMS Spam Filtering: Results under a New Dataset,†Int. J. Inf. Secur. Sci. T., vol. 2, no. 1, pp. 1–18, 2012.

Choudhary, N., & Jain, A. K. "Towards Filtering of SMS Spam Messages Using Machine Learning Technique". Advanced Informatics for Computing Research, vol 712, pp. 18–30, 2017 https://doi.org/10.1007/978-981-10-5780-9.

Pham, T.H., Le-Hong, P. "Content-based approach for Vietnamese spam SMS filtering", in: 2016 International Conference on Asian Language Processing (IALP), pp. 41–44, 2016

Bank Negara Malaysia. "Alert on SMS Scam and Fake Website Involving Bank Negara Malaysia Name". [Online]. Available: http://www.bnm.gov.my/index.php?ch=en_announcement&pg=en_announcement&ac=536. 2017

Davenport, J.R.A., DeLine, R., "The Readability of Tweets and their Geographic Correlation with Education" https://arxiv.org/abs/1401.6058. 2014

Dermawan, A., "Accountant loses RM510,000 to “Bank Negara†scammers". News Straits Time. October 19, 2017. 2017

Kaya, Y., & Faruk, Ö. "A novel feature extraction approach in SMS spam filtering for mobile communication: one-dimensional ternary patterns". Security and Communication Networks, vol. 9 no.17, pp.4680-4690, 2016

Abdulhamid, S.M., Latiff, M.S.A., Chiroma, H., Osho, O., Abdul-Salaam, G., Bakar, A.A., Herawan, T., "A Review on Mobile SMS Spam Filtering Techniques". IEEE Access pp. 1–1, 2017

Witten, I.H., Frank, E., Hall, M.A. and Pal, C.J., "Data Mining: Practical machine learning tools and techniques". Morgan Kaufmann. 2016.

R. Article, H. Sajedi, G. Z. Parast, and F. Akbari, “SMS Spam Filtering Using Machine Learning Techniques: A Survey,†Mach. Learn. Res., vol. 1, no. 1, pp. 1–14, 2016.

P. Chhabra, R. Wadhvani, and S. Shukla, “Spam Filtering using Support Vector Machine,†vol. 1, no. 2, pp. 3–5, 2010.

Polytechnic, S., & Region, K. "SMS Spam Detection Using Association Rule". Journal of Theoretical and Applied Information Technology, vol. 96, no.12, pp. 3962–3972, 2018.

H. Najadat, N. Abdulla, R. Abooraig, and S. Nawasrah, “Mobile SMS Spam Filtering based on Mixing Classifiers,†Int. J. Adv. Comput. Res., vol. 1, pp. 1–7, 2014.

T. a Almeida, J. M. G. Hidalgo, and A. Yamakami, “Contributions to the study of SMS spam filtering: new collection and results,†Proc. 11th ACM Symp. Doc. Eng., pp. 259–262, 2011.

J. M. Gómez Hidalgo, G. C. Bringas, E. P. Sánz, and F. C. García, “Content based SMS spam filtering,†Proc. 2006 ACM Symp. Doc. Eng. - DocEng ’06, no. January, p. 107, 2006.

G. V. Cormack, J. M. G. Hidalgo, and E. P. Sánz, “Feature engineering for mobile (SMS) spam filtering,†Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. - SIGIR ’07, pp. 871, 2007.

S. J. Delany, M. Buckley, and D. Greene, “SMS spam filtering: Methods and data,†Expert Syst. Appl., vol. 39, no. 10, pp. 9899–9908, 2012.

N. Chaudhari, P. Jayvala, and P. Vinitashah, “Survey on Spam SMS filtering using Data mining Techniques,†International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), vol. 5, no. 11, pp. 193–195, 2016.

Zainal, K., Sulaiman, N. F., & Jali, M. Z. "An Analysis of Various Algorithms for Text Spam Classification and Clustering Using RapidMiner and Weka". International Journal of Computer Science and Information Security (IJCSIS), vol. 13, no 3, pp. 66–74, 2015.




DOI: http://dx.doi.org/10.18517/ijaseit.10.2.10175

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development