Predicting Students’ Academic Performance in the University Using Meta Decision Tree Classifiers

: Student performance prediction is an area of concern for educational institutions. At the University level learning system, the method or rule adopted to identify the candidates who pass or fail differs depending on various factors such as the course, the department of study and so on. Predicting the result of a student in a course is an issue that has recently been addressed using machine learning techniques. The focus of this work is to find a way to predict a student’s academic performance in the University using the machine learning approach. This is done by using the previous records of the student rather than applying course dependent formulae to predict the student’s final grade. In this work, meta decision tree classifier techniques based on four representative learning algorithms, namely Adaboost, Bagging, Dagging and Grading are used to construct different decision trees. REPTree is used as the decision tree method for meta learning. These four meta learning methods have been compared separately with respect to the training and test sets. Adaboost is found to be the best meta decision classifier for predicting the student’s result based on the marks obtained in the semester.


Introduction
Universities operate in very energetic and effective viable environments. A massive volume of data with reference to the students is available in digital form. However, the effective conversion of this voluminous data into knowledge for decision making is a huge problem. Predicting a student's performance is one such challenging issue faced by the educational sector (Asogbon et al., 2016). In recent years, many research works have focused on data mining techniques in higher education institutions to enhance the method of learning. The existing work in predicting students' performance includes analysing students' enrolment data to prevent dropouts, to predict student detention at an early stage and to analyse the quality and usage of learning materials. Developing an automated system for this will help educators to monitor their students' achievements (Buldu and Ucgun, 2010;Delen, 2010;Marquez-Vera et al., 2016) and the students to enhance their learning skills. The automated system will also help the administrative staff to upgrade the institutions' performance. Thus, the application of data mining techniques can be focused on particular applications of an automated system (Amrieh et al., 2016;Chen and Bai, 2010;Hien and Haddawy, 2007;Nespereira et al., 2015).
An ensemble of different machine learning algorithms is an effective method used for acquiring a high level of predictive accuracy However, such improvements are predicated based upon the existing ensembles. If every instance in the ensemble behaves in nearly the same way, a little more is achieved by combining their predictions. Decision trees are best adopted for ensemble methods as they are fast and stable (Buldu and Ucgun, 2010). The main purpose of this work is to compare the performance of various meta decision tree algorithms in predicting the performance of students in both training sets and test sets. The remainder of this paper is structured as follows. Section 2 presents the background. Section 3 describes the data models used. Section 4 presents the methods adopted and section 5 discusses the results followed by section 6 with the conclusions and future research possibilities.

655
Background Predicting a student's performance has been studied previously in educational data mining research in the context of student attrition. Wolff et al. (2014) explored the effectiveness of predictive modelling methods for identifying students who will benefit most from tutor interventions in distance learning. The students and tutor will not meet face to face in the case of distance learning. The methods analysed for distance learning included decision-tree classification, support vector machine, general unary hypotheses automaton, Bayesian networks and linear and logistic regression. Romero et al. (2013) investigated how the accuracy of prediction was affected by factors like selection of instances and attributes, the usage of classification algorithms and the date when the data was gathered. A new Moodle's module was developed for gathering forum indicators. Using this Moodle, different experiments were carried out using real data from 114 university students in a first-year course in computer science. The results achieved proved its effectiveness both in terms of final prediction at the end of the course and early prediction before the end of the course.
Marquez-Vera et al. (2016) proposed a technique and a classification algorithm to construct a prediction model for student dropout as early as possible. The data set used for their research was obtained from 419 high school students in Mexico. Several experiments were carried out to predict dropout at different levels of the course. Some classical and imbalanced well-known classification algorithms were compared with their proposed algorithm to give the best indicator of dropouts. The results obtained in their research work showed that the algorithm devised by them was effective enough to predict student dropout within the first 4-6 weeks of the course. The algorithm can be used as an early warning system. Ramesh et al. (2013) adopted an experimental methodology to generate a database constructed from primary and secondary sources. The results obtained by this work reveal that parents' occupations play a major role in predicting the students' grade, whereas the type of school did not influence the students' results. Such findings can help institutions to identify the weak students at risk and concentrate on providing additional training to them. Zhang et al. (2015), in order to improve the academic level at the undergraduate stage and achieve a better graduation thesis grade, researched the hidden relations between courses and graduation thesis grades and employed the support vector machine to construct a prediction model for predicting the graduation thesis grades of undergraduates. Some other prediction models (Neural Network, Decision Tree and Naïve Bayes) were also built to predict the graduation thesis grades of undergraduates, but the result showed that the Support Vector Machine (SVM) performed better in this case study. Strecht et al. (2015) addressed the problem of predicting the success or failure of a student in a course or a program using data mining techniques. They evaluated some of the most popular classification and regression algorithms in this problem. They addressed two problems in particular: Prediction of approval/failure and prediction of grade. The algorithms with best results overall in classification were decision trees and SVM while in regression they were SVM, Random Forest and AdaBoost (Illanas Vila et al., 2013). Arsad et al. (2013) used an Artificial Neural Network (ANN) model to predict the academic performance of engineering students pursuing a bachelor's degree. The study takes the Grade Point (GP) scored by the students in fundamental subjects as inputs without considering their demographic background, while it takes the Cumulative Grade Point Average (CGPA) as output. Schalk et al. (2011) built a machine-learning-based predictive system to determine which students were at risk of failing introductory courses in mathematics and physics. The system used the Random Forest technique to model data coming from previous years of SAT. While their results were good, the method designed was neither meant to be maintained over time nor to make progressive predictions based on incremental information.
In summary, various researches have been investigated to solve educational problems using data mining techniques. However, very few researches have shed light on students' behaviour during the learning process and its impact on the students' academic success. This proposed research will focus on the impact of the academic system on the students' performance. The performance of the student's predictive model was evaluated by a set of classifiers namely, ANN, Naïve Bayesian and Decision tree algorithms. In addition, we applied ensemble methods to improve the performance of these classifiers. The extracted knowledge will help schools to enhance students' academic performance and help administrators to improve learning systems. This work also concentrates, in particular, on the performance of students to predict whether the student will pass or fail at the end of the degree to differentiate between the strong and weak students. The current research study varies from other works by limiting the variables in predicting performance to marks only; no demographic or socioeconomic data were used. It takes one training set to build a model and another test set to evaluate it, thus allowing for some measurement of how well findings can be generalized.

Methodology
In this study, we introduce a performance model for students using ensemble methods. An ensemble method is a learning approach that combines multiple models to solve a problem. In contrast to traditional learning approaches which train data by one learning model, ensemble methods try to train data with a set of models and then combine them to take a vote on their results. The predictions made by ensembles are usually more accurate than predictions made by a single model. Figure 1 shows the methodology design of this research work.
The following are the steps involved in methodology design: • Collect data and identify the features for datasets • For a dataset, develop the following prediction models using the respective training dataset • Bagging.
• Predict the marks for all the models mentioned above • Compare the prediction results with actual results REPTree method is used as base prediction method for all ensemble methods employed. REPTree algorithm is based on the principle of calculating the information gain with entropy and reducing the error arising from variance. The complexity of the decision tree model is decreased by "reduced error pruning method" and the error arising from variance is reduced.

Boosting
In Boosting, a random subset of training samples d1 is selected without replacement from the training set D to train a weak learner C1. To select a second random training subset d2 without replacement from the training set, add 50 percent of the samples that were previously misclassified to train a weak learner C2. Find the training samples d3 in the training set D on which C1 and C2 disagree to train a third weak learner C3. Combine all the weak learners via majority voting (Petkovic et al., 2012).

Bagging
Bagging represents Bootstrap AGGreg at ING. The steps involved in bagging are as follows: • Generate n different bootstrap training samples • Train the algorithm on each bootstrapped sample separately • Average the predictions at the end

Dagging
Dagging Meta classifier creates a number of disjoint, stratified folds out of the data and feeds each chunk of data to a copy of the supplied base classifier. With majority voting, predictions are made on base classifiers that are accumulated into the Vote Meta classifier. A number of training instances useful for base classifiers that are quadratic or worse in time behaviour, are identified (Sorour et al., 2015).

Grading
The underlying idea of grading is to predict whether the prediction for a particular example is correct or not for each of the original learning algorithms. Therefore, one classifier is trained for each original learning algorithm with the training set of original examples.
These original examples have class labels that encode whether the prediction of the learners was correct in this particular example (Yoo and Kim, 2014).

Results
In this study, we used the marks obtained by computer science and engineering students in two semesters. Each semester has five subjects. We collected the marks of 401 undergraduate students who were enrolled in the academic year of 2014-15 and used it as the training set. The data contains variables related to the students' university examination marks in various subjects that were taught in the first and second semester. The description of the dataset used is shown in Table 1. For the test set, the marks obtained by students in the third semester were used. Table 2 shows the attribute description of the dataset used. Figure 2 shows the tree model generated by the training dataset using REPTree.
The performance of the following Meta classifiers, Bagging, Boosting, Dagging and Grading were examined using Weka knowledge flow environment. Figure 3 shows the sample tree generated during an iteration for Bagging REPTree classifier. The attribute S3 is identified as the root node.    Figure 5 depicts the model tree obtained for Dagging REPTree. Figure 6 shows the model generated by grading meta classifier.
As mentioned earlier, the Meta decision tree classifier using REPTree method was developed using Weka knowledge flow environment. The parameter setting available as default in the tool is used for all the algorithms. A sample knowledge flow layout for REPTree classifier is shown in Fig. 7.
The performances of the classifiers depend on the characteristics of the data to be classified. The performance results of the chosen algorithms are used to measure the accuracy, precision, recall and F-score. The performance measures obtained for individual REPTree classifier used for the training set and the test set are shown in Table 3.
The performance of meta decision tree classifiers are measured in terms of precision, recall and F-score.      The results of F-Score for the meta decision tree classifiers employed are shown in Table 6. The F-score is a weighted combination of Bagging and Boosting. Thus almost similar observations of precision and recall are noted for the F-score of the meta decision tree classifiers. The F-score values also show that Bagging and Boosting of REPTree classifiers perform better than Dagging and Grading of REPTree classifiers for both the training set and the test set.
The accuracy of the Meta decision tree classifiers are calculated and compared using Weka experimenter. The accuracy result obtained show that Bagging shows better accuracy than the other meta decision tree classifiers (Fig. 8). The boosting REPTree classifier ranks next in the hierarchy. Similar results are noted for the test set also.
The results obtained in this research work are compared with existing work and the results are tabulated in Table 7. The results show that the ensemble method proves to be better than other individual methods employed earlier in predicting students' results in educational institutions.

Conclusion
Predicting the student's performance is the most effective way to a dedicators and learners in upgrading their teaching and learning processes. Better inferences could be drawn with the classification approach resulting in better prediction of whether a student will pass or fail in a course. Further analysis is necessary to better understand and improve these results. In addition to the problems studied in this work, it would be interesting to predict an interval for a grade. This method will aid the educational institutions to monitor the performance of students in an effective and systematic way. Lastly, this model can help educators understand learners, identify weak learners, to improve learning processes and bring down academic failure rates. It also can help the administrators to improve the learning system outcomes.
In our future work, we will focus more on analysing behavioural features on the students' performance model. This result may prove to be a more realistic predictive model. Some optimization could be made using a parameter selection method such as feature selection. In conclusion, the meta-analysis on predicting a student's performance has inspired us to conduct further research to be applied in various educational institutions.