Tree-Based Ensemble Methods and Their Applications for Predicting Studentsâ€™ Academic Performance

Indri Dayanah Ayulani; Agatha Melinda Yunawan; Tamara Prihutaminingsih; Devvi Sarwinda; Gianinna Ardaneswari; Bevina Desjwiandra Handari

doi:10.18517/ijaseit.13.3.16880

Tree-Based Ensemble Methods and Their Applications for Predicting Studentsâ€™ Academic Performance

Indri Dayanah Ayulani, Agatha Melinda Yunawan, Tamara Prihutaminingsih, Devvi Sarwinda, Gianinna Ardaneswari, Bevina Desjwiandra Handari

Abstract

Studentsâ€™ academic performance is a key aspect of online learning success. Online learning applications known as Learning Management Systems (LMS) store various online learning activities. In this research, studentsâ€™ academic performances in online course X are predicted such that teachers could identify students who are at risk much sooner. The prediction uses tree-based ensemble methods such as Random Forest, XGBoost (Extreme Gradient Boosting), and LightGBM (Light Gradient Boosting Machine). Random Forest is a bagging method, whereas XGBoost and LightGBM are boosting methods. The data recorded in LMS UI, or EMAS (e-Learning Management Systems) is collected. The data consists of activity data for 232 students (219 passed, 13 failed) in course X. This data is divided into three proportions (80:20, 70:30, and 60:40) and three periods (the first, first two, and first three months of the study period). Data is pre-processed using the SMOTE method to handle imbalanced data and implemented in all categories, with and without feature selection. The prediction results are compared to determine the best time for predicting studentsâ€™ academic performance and how well each model can predict the number of unsuccessful students. The implementation results show that studentsâ€™ academic performance can be predicted at the end of the second month, with best prediction rates of 86.8%, 80%, and 75% for the LightGBM, Random Forest, and XGBoost models, respectively, with feature selection. Therefore, with this prediction, students who could fail still have time to improve their academic performance.

Keywords

Studentsâ€™ academic performance; online learning; learning management systems; tree-based ensemble methods; machine learning; Random Forest; XGBoost; LightGBM; features selection; learning analytics.

Full Text:

PDF

References

F. Chen and Y. Cui, â€œUtilizing student time series behaviour in learning management systems for early prediction of course performance,â€ J. Learn. Anal., vol. 7, no. 2, pp. 1â€“17, 2020, doi: 10.18608/JLA.2020.72.1.

G. AkÃ§apÄ±nar, A. Altun, and P. AÅŸkar, â€œUsing learning analytics to develop early-warning system for at-risk students,â€ Int. J. Educ. Technol. High. Educ., vol. 16, no. 1, 2019, doi: 10.1186/s41239-019-0172-z.

E. Latif and S. Miles, â€œThe Impact of Assignments and Quizzes on Exam Grades: A Difference-in-Difference Approach,â€ J. Stat. Educ., vol. 28, no. 3, 2020, doi: 10.1080/10691898.2020.1807429.

E. Alyahyan and D. DÃ¼ÅŸtegÃ¶r, â€œPredicting academic success in higher education: literature review and best practices,â€ International Journal of Educational Technology in Higher Education, vol. 17, no. 1. 2020, doi: 10.1186/s41239-020-0177-7.

E. Popescu and F. Leon, â€œPredicting Academic Performance Based on Learner Traces in a Social Learning Environment,â€ IEEE Access, vol. 6, 2018, doi: 10.1109/ACCESS.2018.2882297.

S. Jayaprakash, S. Krishnan, and J. Jaiganesh, â€œPredicting Students Academic Performance using an Improved Random Forest Classifier,â€ 2020 Int. Conf. Emerg. Smart Comput. Informatics, ESCI 2020, pp. 238â€“243, 2020, doi: 10.1109/ESCI48226.2020.9167547.

M. M. De Oliveira, R. Barwaldt, M. R. Pias, and D. B. Espindola, â€œUnderstanding the Student Dropout in Distance Learning,â€ in Proceedings - Frontiers in Education Conference, FIE, 2019, vol. 2019-October, doi: 10.1109/FIE43999.2019.9028433.

S. Helal et al., â€œPredicting academic performance by considering student heterogeneity,â€ Knowledge-Based Syst., vol. 161, 2018, doi: 10.1016/j.knosys.2018.07.042.

Y. Zhao et al., â€œEnsemble learning predicts multiple sclerosis disease course in the SUMMIT study,â€ npj Digit. Med., vol. 3, no. 1, 2020, doi: 10.1038/s41746-020-00338-8.

T. H. Lee, A. Ullah, and R. Wang, â€œBootstrap Aggregating and Random Forest,â€ Adv. Stud. Theor. Appl. Econom., vol. 52, pp. 389â€“429, 2020, doi: 10.1007/978-3-030-31150-6_13.

S. Rahman, M. Irfan, M. Raza, K. M. Ghori, S. Yaqoob, and M. Awais, â€œPerformance analysis of boosting classifiers in recognizing activities of daily living,â€ Int. J. Environ. Res. Public Health, vol. 17, no. 3, 2020, doi: 10.3390/ijerph17031082.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques. 2012.

L. Breiman, â€œRandom forests,â€ Mach. Learn., vol. 45, no. 1, 2001, doi: 10.1023/A:1010933404324.

D. Denisko and M. M. Hoffman, â€œClassification and interaction in random forests,â€ Proceedings of the National Academy of Sciences of the United States of America, vol. 115, no. 8. 2018, doi: 10.1073/pnas.1800256115.

K. Lin, Y. Hu, and G. Kong, â€œPredicting in-hospital mortality of patients with acute kidney injury in the ICU using random forest model,â€ Int. J. Med. Inform., vol. 125, 2019, doi: 10.1016/j.ijmedinf.2019.02.002.

L. E. Raileanu and K. Stoffel, â€œTheoretical comparison between the Gini Index and Information Gain criteria,â€ Ann. Math. Artif. Intell., vol. 41, no. 1, 2004, doi: 10.1023/B:AMAI.0000018580.96245.c6.

K. C. Dewi, H. Murfi, and S. Abdullah, â€œAnalysis Accuracy of Random Forest Model for Big Data - A Case Study of Claim Severity Prediction in Car Insurance,â€ 2019, doi: 10.1109/ICSITech46713.2019.8987520.

T. Chen and C. Guestrin, â€œXGBoost: A scalable tree boosting system,â€ in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, vol. 13-17-August-2016, doi: 10.1145/2939672.2939785.

D. Zhang and Y. Gong, â€œThe Comparison of LightGBM and XGBoost Coupling Factor Analysis and Prediagnosis of Acute Liver Failure,â€ IEEE Access, 2020, doi: 10.1109/ACCESS.2020.3042848.

S. Wang et al., â€œA new method of diesel fuel brands identification: SMOTE oversampling combined with XGBoost ensemble learning,â€ Fuel, vol. 282, 2020, doi: 10.1016/j.fuel.2020.118848.

W. Liang, S. Luo, G. Zhao, and H. Wu, â€œPredicting hard rock pillar stability using GBDT, XGBoost, and LightGBM algorithms,â€ Mathematics, vol. 8, no. 5, 2020, doi: 10.3390/MATH8050765.

C. Chen, Q. Zhang, Q. Ma, and B. Yu, â€œLightGBM-PPI: Predicting protein-protein interactions through LightGBM with multi-information fusion,â€ Chemom. Intell. Lab. Syst., vol. 191, 2019, doi: 10.1016/j.chemolab.2019.06.003.

G. Ke et al., â€œLightGBM: A highly efficient gradient boosting decision tree,â€ in Advances in Neural Information Processing Systems, 2017, vol. 2017-December.

A. A. Taha and S. J. Malebary, â€œAn Intelligent Approach to Credit Card Fraud Detection Using an Optimized Light Gradient Boosting Machine,â€ IEEE Access, vol. 8, 2020, doi: 10.1109/ACCESS.2020.2971354.

A. S. Hussein, T. Li, C. W. Yohannese, and K. Bashir, â€œA-SMOTE: A new pre-processing approach for highly imbalanced datasets by improving SMOTE,â€ Int. J. Comput. Intell. Syst., vol. 12, no. 2, 2019, doi: 10.2991/ijcis.d.191114.002.

N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, â€œsnopes.com: Two-Striped Telamonia Spider,â€ J. Artif. Intell. Res., vol. 16, no. Sept. 28, pp. 321â€“357, 2002, [Online]. Available: https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp.

I. DÃ¼ntsch and G. Gediga, â€œConfusion Matrices and Rough Set Data Analysis,â€ in Journal of Physics: Conference Series, 2019, vol. 1229, no. 1, doi: 10.1088/1742-6596/1229/1/012055.

J. D. NovakoviÄ‡, A. VeljoviÄ‡, S. S. IliÄ‡, Å½. PapiÄ‡, and T. Milica, â€œEvaluation of Classification Models in Machine Learning,â€ Theory Appl. Math. Comput. Sci., vol. 7, no. 1, 2017.

Q. Wu, F. Nasoz, J. Jung, B. Bhattarai, and M. V. Han, â€œMachine Learning Approaches for Fracture Risk Assessment: A Comparative Analysis of Genomic and Phenotypic Data in 5130 Older Men,â€ Calcif. Tissue Int., vol. 107, no. 4, 2020, doi: 10.1007/s00223-020-00734-y.

M. Q. R. Pembury Smith and G. D. Ruxton, â€œEffective use of the McNemar test,â€ Behav. Ecol. Sociobiol., vol. 74, no. 11, 2020, doi: 10.1007/s00265-020-02916-y.

DOI: http://dx.doi.org/10.18517/ijaseit.13.3.16880

Refbacks

There are currently no refbacks.

Published by INSIGHT - Indonesian Society for Knowledge and Human Development

International Journal on Advanced Science, Engineering and Information Technology

Tree-Based Ensemble Methods and Their Applications for Predicting Studentsâ€™ Academic Performance

Abstract

Keywords

Full Text:

References

Refbacks