Developing a data mining classification model to predict the academic performance of students in Public Basic Schools in Ghana using socio-economic variables. a case study of selected publicbasic schools in the Ablekuma West Constituency.
Date
2014-07-13
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The use of educational related data is often beneficial in data mining applications and it has proven to
be useful to both decision-making processes and thepromotion of social goals. Most developing
nations are concentrating on ways to use Information Systems as platforms to champion their national
development agenda in all areas of their economy, including education. Despite the high percentage
of trained teachers in the public basic schools, results from the West African Examinations Council
(WAEC) indicates that public basic schools fare poorer in the Basic Education Certificate
Examination (BECE) than their private basic schoolscounterparts. This thesis focuses on using socio-economic variables to develop a data mining classification model that can be used to identify students
from poor socio-economic backgrounds and help improve their performance before writing the Basic
Education Certificate Examination. The population for this study comprised of 800 junior high school
students whilst a convenient sample of 200 studentsare used for this study. The CRISP-DM (Cross-Industry Standard Process for Data Mining) is used as a solid framework for guiding the project
because of its non-proprietary and neutral background. Three popular algorithms are discussed and
the C4.5 algorithm is chosen as the preferred algorithm because of its level of accuracy on unseen
data. These algorithms are Naïve Bayes, ID3 and C4.5. The C4.5 algorithm is used to analyze the
training set and build a classifier that is used tocorrectly classify both the training and test examples.
A standard machine learning technique is used to analyze the training data and test the accuracy of
the hypothesis in predicting the categorization of unseen examples with the test data. This testing
process is further boosted by deploying the use of the ROC graph to aid in visualization. This graph is
used to present a graphical presentation of the relationship between sensitivity and specificity and to
decide on the models optimality through the determination of the best threshold for the classifier.
Sensitivity, Specificity and Accuracy are used to measure the correctness of the model by calculating
for the True and False Positives and Negatives (Type I and Type II error). The model achieved an
accuracy rate of 74%, a recall (R) of 73%, specificity of 75% and a precision of 80%. This study has
demonstrated the practicality and feasibility of classifying student academic performance based on
the selected socio-economic variables.
Description
A thesis submitted to the Department of Computer Science,
Kwame Nkrumah University of Science and Technology
in partial fulfillment of the requirements for the degree of
Master of Science, 2014