Loading…

Machine learning model (RG-DMML) and ensemble algorithm for prediction of students’ retention and graduation in education

Automated prediction of students' retention and graduation in education using advanced analytical methods such as artificial intelligence (AI), has recently attracted the attention of educators, both in theory and in practice. Whereas invaluable insights and theories for measuring and testing t...

Full description

Saved in:
Bibliographic Details
Published in:Computers and education. Artificial intelligence 2024-06, Vol.6, p.100205, Article 100205
Main Authors: Okoye, Kingsley, Nganji, Julius T., Escamilla, Jose, Hosseini, Samira
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Automated prediction of students' retention and graduation in education using advanced analytical methods such as artificial intelligence (AI), has recently attracted the attention of educators, both in theory and in practice. Whereas invaluable insights and theories for measuring and testing the topic have been proposed, most of the existing methods do not technically highlight the non-trivial factors behind the renowned challenges and attrition. To this effect, by making use of two categories of data collected in a higher education setting about students (i) retention (n = 52262) and (ii) graduation (n = 53639); this study proposes a machine learning model - RG-DMML (retention and graduation data mining and machine learning) and ensemble algorithm for prediction of students' retention and graduation status in education. This was done by training and testing key features that are technically deemed suitable for measuring the constructs (retention and graduation), such as (i) the Average grade of the previous high school, and (ii) the Entry/admission score. The proposed model (RG-DMML) is designed based on the cross industry standard process for data mining (CRISP-DM) methodology, implemented using supervised machine learning technique such as K-Nearest Neighbor (KNN), and validated using the k-fold cross-validation method. The results show that the executed model and algorithm based on the Bagging method and 10-fold cross-validation are efficient and effective for predicting the student's retention and graduation status, with Precision (retention = 0.909, graduation = 0.822), Recall (retention = 1.000, graduation = 0.957), Accuracy (retention = 0.909, graduation = 0.817), F1-Score (retention = 0.952, graduation = 0.885) showing significant high accuracy levels or performance rate, and low Error-rate (retention = 0.090, graduation = 0.182), respectively. In addition, by considering the individual features selected through the Wrapper method in predicting the outputs, the proposed model proved more effective for predicting the students' retention status in comparison to the graduation data. The implications of the models' output and factors that impact the effective prediction or identification of at-risk students, e.g., for timely intervention, counselling, decision-making, and sustainable educational practice are empirically discussed in the study. •A machine learning model (RG-DMML) was proposed to predict students' retention and graduation status in edu
ISSN:2666-920X
2666-920X
DOI:10.1016/j.caeai.2024.100205