Tree-Based Ensemble Methods and Their Applications for Predicting Students’ Academic Performance

Indri Dayanah Ayulani, Agatha Melinda Yunawan, Tamara Prihutaminingsih, Devvi Sarwinda, Gianinna Ardaneswari, Bevina Desjwiandra Handari

Abstract


Students’ academic performance is a key aspect of online learning success. Online learning applications known as Learning Management Systems (LMS) store various online learning activities. In this research, students’ academic performances in online course X are predicted such that teachers could identify students who are at risk much sooner. The prediction uses tree-based ensemble methods such as Random Forest, XGBoost (Extreme Gradient Boosting), and LightGBM (Light Gradient Boosting Machine). Random Forest is a bagging method, whereas XGBoost and LightGBM are boosting methods. The data recorded in LMS UI, or EMAS (e-Learning Management Systems) is collected. The data consists of activity data for 232 students (219 passed, 13 failed) in course X. This data is divided into three proportions (80:20, 70:30, and 60:40) and three periods (the first, first two, and first three months of the study period). Data is pre-processed using the SMOTE method to handle imbalanced data and implemented in all categories, with and without feature selection. The prediction results are compared to determine the best time for predicting students’ academic performance and how well each model can predict the number of unsuccessful students. The implementation results show that students’ academic performance can be predicted at the end of the second month, with best prediction rates of 86.8%, 80%, and 75% for the LightGBM, Random Forest, and XGBoost models, respectively, with feature selection. Therefore, with this prediction, students who could fail still have time to improve their academic performance.

Keywords


Students’ academic performance; online learning; learning management systems; tree-based ensemble methods; machine learning; Random Forest; XGBoost; LightGBM; features selection; learning analytics.

Full Text:

PDF

References


F. Chen and Y. Cui, “Utilizing student time series behaviour in learning management systems for early prediction of course performance,†J. Learn. Anal., vol. 7, no. 2, pp. 1–17, 2020, doi: 10.18608/JLA.2020.72.1.

G. Akçapınar, A. Altun, and P. Aşkar, “Using learning analytics to develop early-warning system for at-risk students,†Int. J. Educ. Technol. High. Educ., vol. 16, no. 1, 2019, doi: 10.1186/s41239-019-0172-z.

E. Latif and S. Miles, “The Impact of Assignments and Quizzes on Exam Grades: A Difference-in-Difference Approach,†J. Stat. Educ., vol. 28, no. 3, 2020, doi: 10.1080/10691898.2020.1807429.

E. Alyahyan and D. Düştegör, “Predicting academic success in higher education: literature review and best practices,†International Journal of Educational Technology in Higher Education, vol. 17, no. 1. 2020, doi: 10.1186/s41239-020-0177-7.

E. Popescu and F. Leon, “Predicting Academic Performance Based on Learner Traces in a Social Learning Environment,†IEEE Access, vol. 6, 2018, doi: 10.1109/ACCESS.2018.2882297.

S. Jayaprakash, S. Krishnan, and J. Jaiganesh, “Predicting Students Academic Performance using an Improved Random Forest Classifier,†2020 Int. Conf. Emerg. Smart Comput. Informatics, ESCI 2020, pp. 238–243, 2020, doi: 10.1109/ESCI48226.2020.9167547.

M. M. De Oliveira, R. Barwaldt, M. R. Pias, and D. B. Espindola, “Understanding the Student Dropout in Distance Learning,†in Proceedings - Frontiers in Education Conference, FIE, 2019, vol. 2019-October, doi: 10.1109/FIE43999.2019.9028433.

S. Helal et al., “Predicting academic performance by considering student heterogeneity,†Knowledge-Based Syst., vol. 161, 2018, doi: 10.1016/j.knosys.2018.07.042.

Y. Zhao et al., “Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study,†npj Digit. Med., vol. 3, no. 1, 2020, doi: 10.1038/s41746-020-00338-8.

T. H. Lee, A. Ullah, and R. Wang, “Bootstrap Aggregating and Random Forest,†Adv. Stud. Theor. Appl. Econom., vol. 52, pp. 389–429, 2020, doi: 10.1007/978-3-030-31150-6_13.

S. Rahman, M. Irfan, M. Raza, K. M. Ghori, S. Yaqoob, and M. Awais, “Performance analysis of boosting classifiers in recognizing activities of daily living,†Int. J. Environ. Res. Public Health, vol. 17, no. 3, 2020, doi: 10.3390/ijerph17031082.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. 2012.

L. Breiman, “Random forests,†Mach. Learn., vol. 45, no. 1, 2001, doi: 10.1023/A:1010933404324.

D. Denisko and M. M. Hoffman, “Classification and interaction in random forests,†Proceedings of the National Academy of Sciences of the United States of America, vol. 115, no. 8. 2018, doi: 10.1073/pnas.1800256115.

K. Lin, Y. Hu, and G. Kong, “Predicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model,†Int. J. Med. Inform., vol. 125, 2019, doi: 10.1016/j.ijmedinf.2019.02.002.

L. E. Raileanu and K. Stoffel, “Theoretical comparison between the Gini Index and Information Gain criteria,†Ann. Math. Artif. Intell., vol. 41, no. 1, 2004, doi: 10.1023/B:AMAI.0000018580.96245.c6.

K. C. Dewi, H. Murfi, and S. Abdullah, “Analysis Accuracy of Random Forest Model for Big Data - A Case Study of Claim Severity Prediction in Car Insurance,†2019, doi: 10.1109/ICSITech46713.2019.8987520.

T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,†in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, vol. 13-17-August-2016, doi: 10.1145/2939672.2939785.

D. Zhang and Y. Gong, “The Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure,†IEEE Access, 2020, doi: 10.1109/ACCESS.2020.3042848.

S. Wang et al., “A new method of diesel fuel brands identification: SMOTE oversampling combined with XGBoost ensemble learning,†Fuel, vol. 282, 2020, doi: 10.1016/j.fuel.2020.118848.

W. Liang, S. Luo, G. Zhao, and H. Wu, “Predicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms,†Mathematics, vol. 8, no. 5, 2020, doi: 10.3390/MATH8050765.

C. Chen, Q. Zhang, Q. Ma, and B. Yu, “LightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion,†Chemom. Intell. Lab. Syst., vol. 191, 2019, doi: 10.1016/j.chemolab.2019.06.003.

G. Ke et al., “LightGBM: A highly efficient gradient boosting decision tree,†in Advances in Neural Information Processing Systems, 2017, vol. 2017-December.

A. A. Taha and S. J. Malebary, “An Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine,†IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2971354.

A. S. Hussein, T. Li, C. W. Yohannese, and K. Bashir, “A-SMOTE: A new pre-processing approach for highly imbalanced datasets by improving SMOTE,†Int. J. Comput. Intell. Syst., vol. 12, no. 2, 2019, doi: 10.2991/ijcis.d.191114.002.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “snopes.com: Two-Striped Telamonia Spider,†J. Artif. Intell. Res., vol. 16, no. Sept. 28, pp. 321–357, 2002, [Online]. Available: https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp.

I. Düntsch and G. Gediga, “Confusion Matrices and Rough Set Data Analysis,†in Journal of Physics: Conference Series, 2019, vol. 1229, no. 1, doi: 10.1088/1742-6596/1229/1/012055.

J. D. Novaković, A. Veljović, S. S. Ilić, Ž. Papić, and T. Milica, “Evaluation of Classification Models in Machine Learning,†Theory Appl. Math. Comput. Sci., vol. 7, no. 1, 2017.

Q. Wu, F. Nasoz, J. Jung, B. Bhattarai, and M. V. Han, “Machine Learning Approaches for Fracture Risk Assessment: A Comparative Analysis of Genomic and Phenotypic Data in 5130 Older Men,†Calcif. Tissue Int., vol. 107, no. 4, 2020, doi: 10.1007/s00223-020-00734-y.

M. Q. R. Pembury Smith and G. D. Ruxton, “Effective use of the McNemar test,†Behav. Ecol. Sociobiol., vol. 74, no. 11, 2020, doi: 10.1007/s00265-020-02916-y.




DOI: http://dx.doi.org/10.18517/ijaseit.13.3.16880

Refbacks

  • There are currently no refbacks.



Published by INSIGHT - Indonesian Society for Knowledge and Human Development