Developing a data mining classification model to predict the academic performance of students in Public Basic Schools in Ghana using socio-economic variables. a case study of selected publicbasic schools in the Ablekuma West Constituency.

Thumbnail Image
Journal Title
Journal ISSN
Volume Title
The use of educational related data is often beneficial in data mining applications and it has proven to be useful to both decision-making processes and thepromotion of social goals. Most developing nations are concentrating on ways to use Information Systems as platforms to champion their national development agenda in all areas of their economy, including education. Despite the high percentage of trained teachers in the public basic schools, results from the West African Examinations Council (WAEC) indicates that public basic schools fare poorer in the Basic Education Certificate Examination (BECE) than their private basic schoolscounterparts. This thesis focuses on using socio-economic variables to develop a data mining classification model that can be used to identify students from poor socio-economic backgrounds and help improve their performance before writing the Basic Education Certificate Examination. The population for this study comprised of 800 junior high school students whilst a convenient sample of 200 studentsare used for this study. The CRISP-DM (Cross-Industry Standard Process for Data Mining) is used as a solid framework for guiding the project because of its non-proprietary and neutral background. Three popular algorithms are discussed and the C4.5 algorithm is chosen as the preferred algorithm because of its level of accuracy on unseen data. These algorithms are Naïve Bayes, ID3 and C4.5. The C4.5 algorithm is used to analyze the training set and build a classifier that is used tocorrectly classify both the training and test examples. A standard machine learning technique is used to analyze the training data and test the accuracy of the hypothesis in predicting the categorization of unseen examples with the test data. This testing process is further boosted by deploying the use of the ROC graph to aid in visualization. This graph is used to present a graphical presentation of the relationship between sensitivity and specificity and to decide on the models optimality through the determination of the best threshold for the classifier. Sensitivity, Specificity and Accuracy are used to measure the correctness of the model by calculating for the True and False Positives and Negatives (Type I and Type II error). The model achieved an accuracy rate of 74%, a recall (R) of 73%, specificity of 75% and a precision of 80%. This study has demonstrated the practicality and feasibility of classifying student academic performance based on the selected socio-economic variables.
A thesis submitted to the Department of Computer Science, Kwame Nkrumah University of Science and Technology in partial fulfillment of the requirements for the degree of Master of Science, 2014