An Improved Accuracy of Multiclass Random Forest Classifier with Continuous Attribute Transformation Using Random Percentile Generation

Ronny Susetyoko, Elly Purwantini, Budi Nur Iman, Edi Satriyanto

Abstract


This study aims to improve classification accuracy by transforming continuous attributes into categories by randomly generating percentile values as categorization limits. Four algorithms were compared for the generation of percentile values and selected based on the small variability of the percentile values and the distribution of the highest revenue expectations. The distribution of testing and training data classification accuracy becomes the second consideration. Random forest (RF) classification is modeled from selected percentiles with three transformation variations. The results of the ANOVA test, the algorithm with three variations of the transformation, has a mean that is not significantly different from the best model and the original dataset model. However, in some variations of training data, RF classification with continuous attribute transformation was superior to the original dataset model. The effectiveness of this continuous attribute transformation algorithm was very well applied to the LR, MLP, and NB methods. In the tuition fee dataset, the application of the algorithm for the three methods each had an accuracy of 0.178, 0.204, and 0.318. The results of the attribute transformation give a significant increase in accuracy to 0.967, 0.949, and 0.594 for each method, respectively. In the date fruits dataset, the attribute transformation was effective in the MLP method with an accuracy of 0.193 (original attribute) to 0.690 (continuous attribute transformation). The transformation results are effectively applied to the LR, MPL, and NB methods for datasets with continuous and categorical mixed attributes.

Keywords


Random Forest; continuous attribute transform; random percentile generation; accuracy; revenue expectation

Full Text:

PDF

References


M. A. Ganaie, M. Tanveer, P. N. Suganthan, and V. Snasel, “Oblique and rotation double random forest,†Neural Networks, vol. 153, pp. 496–517, 2022, doi: 10.1016/j.neunet.2022.06.012.

L. Linhui, J. Weipeng, and W. Huihui, “Extracting the Forest Type from Remote Sensing Images by Random Forest,†IEEE Sens. J., vol. 21, no. 16, pp. 17447–17454, 2021, doi: 10.1109/JSEN.2020.3045501.

A. Dmitry Devyatkin and G. Oleg Grigoriev, “Random Kernel Forests,†IEEE Access, vol. 10, no. July, pp. 77962–77979, 2022, doi: 10.1109/ACCESS.2022.3193385.

M. Gencturk, A. Anil Sinaci, and N. K. Cicekli, “BOFRF: A Novel Boosting-based Federated Random Forest Algorithm on Horizontally Partitioned Data,†IEEE Access, vol. 10, no. August, pp. 89835–89851, 2022, doi: 10.1109/ACCESS.2022.3202008.

Y. Zhu and H. Peng, “Multiple Random Forests Based Intelligent Location of Single-phase Grounding Fault in Power Lines of DFIG-based Wind Farm,†J. Mod. Power Syst. Clean Energy, vol. 10, no. 5, pp. 1152–1163, 2022, doi: 10.35833/mpce.2021.000590.

C. Zou et al., “Heartbeat Classification by Random Forest With a Novel Context Feature: A Segment Label,†IEEE J. Transl. Eng. Heal. Med., vol. 10, no. August 2022, doi: 10.1109/JTEHM.2022.3202749.

R. Susetyoko, W. Yuwono, E. Purwantini, and N. Ramadijanti, “Perbandingan Metode Random Forest , Regresi Logistik , Naïve Bayes , dan Multilayer Perceptron Pada Klasifikasi Uang Kuliah Tunggal (UKT),†vol. 7, no. 1, 2022.

J. Biedrzycki and R. Burduk, “Weighted scoring in geometric space for decision tree ensemble,†IEEE Access, vol. 8, no. 3, pp. 82100–82107, 2020, doi: 10.1109/ACCESS.2020.2990721.

N. Deepa, M. Z. Khan, B. Prabadevi, D. R. P. M. Vincent, P. K. R. Maddikunta, and T. R. Gadekallu, “Multiclass model for agriculture development using multivariate statistical method,†IEEE Access, vol. 8, pp. 183749–183758, 2020, doi: 10.1109/ACCESS.2020.3028595.

X. Liu, X. Liu, Z. Wang, G. Huang, and R. Shu, “Classification of Laser Footprint Based on Random Forest in Mountainous Area Using GLAS Full-Waveform Features,†IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 15, pp. 2284–2297, 2022, doi: 10.1109/JSTARS.2022.3151332.

S. Khade, S. Gite, S. D. Thepade, B. Pradhan, and A. Alamri, “Detection of Iris Presentation Attacks Using Hybridization of Discrete Cosine Transform and Haar Transform with Machine Learning Classifiers and Ensembles,†IEEE Access, vol. 9, pp. 169231–169249, 2021, doi: 10.1109/ACCESS.2021.3138455.

R. Susetyoko, W. Yuwono, E. Purwantini, and B. N. Iman, “Characteristics of Accuracy Function on Multiclass Classification Based on Best, Average, and Worst (BAW) Subset of Random Forest Model,†pp. 410–417, 2022, doi: 10.1109/ies55876.2022.9888374.

Q. Lei, H. Zhang, H. Sun, and L. Tang, “Fingerprint-Based Device-Free Localization in Changing Environments Using Enhanced Channel Selection and Logistic Regression,†IEEE Access, vol. 6, pp. 2569–2577, 2017, doi: 10.1109/ACCESS.2017.2784387.

B. Wang and J. Zhang, “Logistic Regression Analysis for LncRNA-Disease Association Prediction Based on Random Forest and Clinical Stage Data,†IEEE Access, vol. 8, pp. 35004–35017, 2020, doi: 10.1109/ACCESS.2020.2974624.

A. Lucas, A. T. Williams, and P. Cabrales, “Prediction of Recovery from Severe Hemorrhagic Shock Using Logistic Regression,†IEEE J. Transl. Eng. Heal. Med., vol. 7, no. June, pp. 1–9, 2019, doi: 10.1109/JTEHM.2019.2924011.

R. Susetyoko, Wiratmoko Yuwono, and Elly Purwantini, “Model Klasifikasi Pada Seleksi Mahasiswa Baru Penerima KIP Kuliah Menggunakan Regresi Logistik Biner,†J. Inform. Polinema, vol. 8, no. 4, pp. 31–40, 2022, doi: 10.33795/jip.v8i4.914.

Z. Zhang and Y. Han, “Detection of Ovarian Tumors in Obstetric Ultrasound Imaging Using Logistic Regression Classifier with an Advanced Machine Learning Approach,†IEEE Access, vol. 8, pp. 44999–45008, 2020, doi: 10.1109/ACCESS.2020.2977962.

J. C. Nwadiuto, S. Yoshino, H. Okuda, and T. Suzuki, “Variable Selection and Modeling of Drivers’ Decision in Overtaking Behavior Based on Logistic Regression Model with Gazing Information,†IEEE Access, vol. 9, pp. 127672–127684, 2021, doi: 10.1109/ACCESS.2021.3111753.

L. Wang, T. Wang, and X. Hu, “Logistic regression region weighting for weakly supervised object localization,†IEEE Access, vol. 7, pp. 118411–118421, 2019, doi: 10.1109/ACCESS.2019.2935011.

S. Han, “Semi-supervised learning classification based on generalized additive logistic regression for corporate credit anomaly detection,†IEEE Access, vol. 8, pp. 199060–199069, 2020, doi: 10.1109/ACCESS.2020.3035128.

E. Ileberi, Y. Sun, and Z. Wang, “Performance Evaluation of Machine Learning Methods for Credit Card Fraud Detection Using SMOTE and AdaBoost,†IEEE Access, vol. 9, pp. 165286–165294, 2021, doi: 10.1109/ACCESS.2021.3134330.

E. Esenogho, I. D. Mienye, T. G. Swart, K. Aruleba, and G. Obaido, “A Neural Network Ensemble with Feature Engineering for Improved Credit Card Fraud Detection,†IEEE Access, vol. 10, pp. 16400–16407, 2022, doi: 10.1109/ACCESS.2022.3148298.

A. Rahim, Y. Rasheed, F. Azam, M. W. Anwar, M. A. Rahim, and A. W. Muzaffar, “An Integrated Machine Learning Framework for Effective Prediction of Cardiovascular Diseases,†IEEE Access, vol. 9, pp. 106575–106588, 2021, doi: 10.1109/ACCESS.2021.3098688.

P. Valdiviezo-Diaz, F. Ortega, E. Cobos, and R. Lara-Cabrera, “A Collaborative Filtering Approach Based on Naïve Bayes Classifier,†IEEE Access, vol. 7, pp. 108581–108592, 2019, doi: 10.1109/ACCESS.2019.2933048.

Z. Xue, J. Wei, and W. Guo, “A Real-Time Naive Bayes Classifier Accelerator on FPGA,†IEEE Access, vol. 8, pp. 40755–40766, 2020, doi: 10.1109/ACCESS.2020.2976879.

T. Le Minh, L. Van Tran, and S. V. T. Dao, “A Feature Selection Approach for Fall Detection Using Various Machine Learning Classifiers,†IEEE Access, vol. 9, pp. 115895–115908, 2021, doi: 10.1109/ACCESS.2021.3105581.

C. K. Aridas, S. Karlos, V. G. Kanas, N. Fazakis, and S. B. Kotsiantis, “Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers under Imbalanced Data Sets,†IEEE Access, vol. 8, pp. 2122–2133, 2020, doi: 10.1109/ACCESS.2019.2961784.

J. Ortiz-Bejar, E. S. Tellez, M. Graff, D. Moctezuma, and S. Miranda-Jimenez, “Improving k Nearest Neighbors and Naïve Bayes Classifiers through Space Transformations and Model Selection,†IEEE Access, vol. 8, pp. 221669–221688, 2020, doi: 10.1109/ACCESS.2020.3042453.

M. A. Siddiqi and W. Pak, “An Agile Approach to Identify Single and Hybrid Normalization for Enhancing Machine Learning-Based Network Intrusion Detection,†IEEE Access, vol. 9, pp. 137494–137513, 2021, doi: 10.1109/ACCESS.2021.3118361.




DOI: http://dx.doi.org/10.18517/ijaseit.13.3.18379

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development