A Descriptive Analysis of Students Learning Skills Using Bloom’s Revised Taxonomy

Corresponding Author: Umamakeswari Arumugam Department of Computer Science Engineering, School of Computing, SASTRA Deemed to be University, Thirumalaisamudram, Thanjavur, Tamil Nadu, India, 613 401 Email: umamakeswari.arumugam@gmail.com Abstract: The academic committees worldwide suggest technical institutions to follow Revised Bloom’s Taxonomy (RBT), a framework that helps to develop learning objectives. The model classifies a hierarchy of educational objectives such as cognitive, sensory and affective domains that are not only helping the students to evolve thinking abilities but also to identify the skills they are lacking with. Analysis of students RBT skills through data mining techniques is more valuable and is yet to be explored. This paper employs predictive and descriptive techniques of data mining to analyze the RBT level of each student. The methodology uses a classifier to classify the RBT level of questions under six levels such as remembering, understanding, applying, analyzing, evaluating, creating and performs clustering of students with respect to overall RBT level and lacking RBT skill of each student. The experimentation is carried out with university students. The results show that the proposed classifier is able to achieve 98% accuracy by correctly classifying RBT levels of input questions. The results also shows that the proposed work creates précised and meaningful clusters of overall RBT level/Lacking RBT skill of each student with precision 0.83 and 0.79 which could help the instructor to design different pedagogical approaches to improve students learning.


Introduction
The objective of Revised Bloom's Taxonomy is to create achievable learning goals and a corresponding plan to meet them (Anderson et al., 2005). The instructors witness the learning goals in behavioral terms. Well-defined goals enable the educators to effectively prepare lesson plans with appropriate content and sources according to the categories in the cognitive domain (Borg, 2003). Instructors can also design valid strategies and assessment tools to ensure whether the categories are met in line with the level of objectives. But, these tools do not contain any procedure for bridging the gap between students actual performance with cognitive domain and it is difficult for the instructors to analyze the performance of each student in mass groups. Moreover, there is no software tool available for describing the post facto effect with Bloom's learning domains. Bloom's Taxonomy verbs can be extended to determine the level of student's interests, attitude and expertise towards a subject (Adams, 2015).
Bloom's Taxonomy verb based questions help the instructors to analyze the level of each student under six categories of cognitive domains such as remembering, understanding, applying, analyzing, evaluating and creating. These levels have a good impact in bridging the gap between education and employability. Mapping of RBT level motivates the instructors to understand whether a student simply by hearts the subject/interpolates problems in their own words/applies the learned concepts into new environments/distinguishes between facts and interference/makes judgments about the ideas on materials or builds a new structure or patterns based on the knowledge acquired. Not all students good at all categories of cognitive domains. There should be some measure to identify the RBT level and lacking skill of each student.
The scope of this work is to conduct predictive and descriptive analyses of students learning skills based on Revise Bloom's Taxonomy levels with the following objectives:  To build an SVM based cognitive domain prediction model using trained RBT questions  To use the model for predicting RBT levels of given set of questions  To synthesize students marks based on each RBT level  To cluster students with similar cognitive domains to analyze the lacking cognitive domain  To analyze the post facto effect to improve students end semester exams The paper analyzes the cognitive domain of students in higher education with respect to Bloom's Taxonomy. The impact of this proposed work would reflect in the students performance. The expected outcome of the paper is a software tool that analyzes the cognitive domain of students studying in the same class. The software clearly depicts the strength of the students within the six categories of remembering, understanding, applying, analyzing, evaluating and creating. The software also analyzes the lacking cognitive domains of the students within the same categories by considering the continuous internal assessment marks of each subject. For instance, if a student is able to create new concepts in one subject, but can only score marks in remembering and understanding categories of questions in other subject, the system will identify the discrepancy of the student's performance between the two subjects.
The paper ensures the application of clustering technique is quite helpful for the formation of homogenous clusters of learners. The clusters will allow the educator to design most effective teaching strategies that really cater to the needs of students, especially those lacking with RBT domain skills. This study is also intended to address the difficulties of the students in create level, which means the difficulties in building new solutions from the learned concepts, as it kindles research in students. Moreover, the analysis of Bloom's Taxonomy based student performance is yet to be explored nationwide.
The remaining sections of the paper are organized as follows: Section 2 reviews the existing literature in data mining with Revised Bloom's Taxonomy, Section 3 explains the proposed methodology, Section 4 discusses the experimentations and results and finally Section 5 concludes the social impacts and research findings of the work. Jayakodi et al. (2016) have automated the categorization of exam questions into RBT learning levels. The authors have used natural language processing to tokenize the words used in the questions and have compared the extracted verbs with Anderson Revised Bloom's taxonomy using Wordnet similarity comparison and cosine similarity module for generating rule set. The authors have claimed that the proposed rule set is able to achieve 70% accuracy in predicting the RBT level of questions. The authors have also suggested evaluators to redesign the question papers based on the output of the classifier. Suhaimi et al. (2016) have analyzed the effects of classifying the written exam question into six cognitive levels in Bloom's Taxonomy. The authors have tried to prove that the classification of written exam questions into correct cognitive level can generate a good set of exam questions. The authors have combined K-Means clustering algorithm and probability based classification approach to classify the class labels in blooms taxonomy and have obtained 50% accuracy. The authors have declared that some keywords may contribute to more than one class label resulting vague measurement that may impact with poor accuracy. Sukajaya et al. (2015) have proposed a Bloom's taxonomy based serious game technology as an approach for measuring cognitive domain of learners. The authors have maximized the opportunities to analyze the potentials of all students based on their profile by collecting the learners data which are theoretically more representative to identify specific learner. The authors have conducted the analysis using Naïve Bayes and J48 using two types of procedure: raw data, transformed data into average value with consideration on cognitive domain of Blooms taxonomy and have obtained highest prediction accuracy with Naïve Bayes algorithm of 92.31%. The authors have claimed that integration of Bloom's taxonomy into serious game technology is potential in measuring learners' information in learning.

Review of Literature
van Konsky et al. (2018) have explored an automatic identification of verbs and other parts of speech that impacts the semantic meaning of Bloom's taxonomy cognitive domains. The authors have classified Bloom's levels using a table lookup and machine learning approach. The authors have analyzed parts of speech in a training corpus with 13,189 learning outcomes at an Australian university. The authors have stated that their proposed approach assists human learning and teaching designers to write learning outcomes and also plays a vital role in automating Bloom's taxonomy classification. The authors have claimed that their method verb table lookup and machine learning approach has obtained 71% prediction accuracy by identifying same cognitive levels. Gottipati and Shankararaman (2018) have presented an automated method to discover the impact of curriculum design and its competency using Bloom"s taxonomy and Dreyfus model. The authors have developed a curriculum analytics tools that generates competency score for the entire curriculum with respect to cognitive and progression levels. The authors have tested the proposed method over 14 courses and the corresponding 578 competencies in information system management curriculum program and have claimed that their method performs in-depth analysis on the curriculum by discovering the cognition and progression statistics and has obtained 74.69% accuracy in predicting the competency of the curriculum. Deshmuk et al. (2018) have implemented a student performance evaluation model for identifying the strengths and weaknesses of students for network analysis course studied by third semester Electronics and Communication Engineering students using fuzzy inference system. The authors have fuzzified the five inputs identify, understand, apply, analyze and design/create for the system to learn Bloom's Taxonomy cognitive domains. The rules are then applied to the questions to evaluate the critical thinking of students. The results evaluated are expressed in linguistic variables and compared with classical aggregate scores and the authors have claimed that the proposed model is flexible in terms of assigning importance to each criterion by modifying fuzzy rules.
Though the implications of data mining techniques in RBT analysis are enormous, the literature limits its contributions only with predicting RBT levels. Descriptive analysis of data mining can make obvious differences to understand the level of students in RBT standards. Thus, this work is not only trying to improve the prediction of accuracy of RBT levels but also to describe the learning ability of students.

Methodology
The component diagram of the proposed methodology shown in Fig. 1. denotes the major functionalities of the proposed system. The system builds a knowledge prediction model for classifying the cognitive levels based on Bloom's taxonomy. The model takes bloom's taxonomy based questions that cover all six RBT levels, as input. The model parses each question by tokenizing individual words and applies lemmatization techniques for removing the stop words in the question. The model also uses stemming algorithms to prune the most significant terms used in the question to its root word. The questions are finally converted as a dataset, where the significant terms of the question are independent variables and Bloom's taxonomy level for the question as dependent or class variable. The model uses support vector machine as it is a widely used classifier algorithm that often produces highest prediction accuracy compared to all other classifiers available in the market. The system implements the SVM classifier for predicting the RBT level of the questions prepared by the staff for internal and external assessment tests. Finally, the performance of the students has been analyzed by the system to map the cognitive level of the students. To accomplish this, the marks obtained by the students for all subjects in the same RBT level is averaged and normalized to logarithmic notation to put to a common form. The normalized marks are then inputted to K-means algorithm for the descriptive analysis. The step by step process of the proposed work is shown in Algorithm 1. predict RBTLevel using SVM Classifier 3. for each student Si, i = 1, 2, 3,…. s in the database do 4.
set aggregatemarks[x]+=Mp 12. set number of clusters k as 3 13. choose the best set of centroids to cluster low, medium and average levels in overall RBT 14. for each cluster Ci, i = 1,2,3….k do 15.
cluster students aggregate marks to the nearest centroids 16.
if there is no change in the cluster group or cluster centroids 17. continue 18. interpret results 19. Set number of clusters k as 6 20. Choose the best set of centroids to cluster lacking skill of students in each level 21. Repeat steps 13 to 17 The proposed work employs SVM multiclass classifier with One against All approach. The method creates k SVM classification models where k represents the number of RBT Levels (Mustaqeem et al., 2018;Wang et al., 2019;Negri, 2018 The basic notion of the paper is to check whether clustering techniques in data mining is useful to make a descriptive analysis of students performance based on RBT levels. This analysis is also intended to identify the lacking skills of students with respect to remember, understand, apply, analyze, evaluate and create. The analysis is carried out by generating two clusters outputs using k-means algorithm, where the former elucidates the overall analysis on students performance Vs. RBT and the latter denotes the lacking RBT skills of the students. K-means algorithm is one of the popular and widely used algorithms used for descriptive analysis (Dubey et al., 2018). In this study, the cluster centroids are manually forced into k-means algorithm for getting the desirable clusters of students. The distance between instances and cluster centroids are computed using Euclidean distance measures and the instances are grouped into the cluster with minimum distance (Yu et al., 2018;Shyr-Shen et al., 2018;Sardar and Ansari, 2018). The process gets repeated until there is no change in the cluster centroids or when no new cluster formation is possible to reduce root mean squared error. Equ.4. denotes the formulation of K-means algorithm (Singhal and Shukla, 2018): Where: ||Ix -Cy|| = The difference between each instance and cluster centroid n = The number of instances x th cluster m = The number of cluster centres

Experimentation and Result Discussions
The experimentation of the proposed work is split into two major segments such as predicting RBT level of questions and performing descriptive analysis of students. Predicting RBT level of questions starts with the construction of knowledge prediction model with support vector machine classifier. The model takes input as questions obtained from a corpus BCLsDataSet (Lashari et al., 2012). The corpus contains six hundred questions, 100 for each Bloom's Taxonomy level such remember, understand, evaluate, analyze, apply and create. The model uses NLP equivalent R programming methods to perform bag of words, part of speech and n-grams operations for stop words removal and root word extraction on the questions. Selected terms from questions with RBT level as class label is then built as a dataset and executed upon SVM classifier. Radial bias function is used as a kernel function for SVM classifier as it is a popular function used in kernelized algorithms. Table 1 shows the 66 confusion matrix of SVM classifier with the left axis showing the true class and the top showing the class assigned to an item with that true class. Table 2 denotes the prediction accuracy and kappa statistics of SVM classifier for test set of BCLsDataSet. Accuracy denotes the percentage of correctly classified instances out of all instances. The SVM algorithm correctly predicts the RBT levels of questions upto 98%, another accuracy indicator is the kappa score, which is a measure of comparing the classification results to values assigned by chance. The kappa score closer to 1 reveals the classified results and the ground truth are identical and the kappa score of the SVM classier for the experimental dataset is 0.972, which is close to 1, thus, can be used as a standard classifier for predicting the RBT levels of questions. The performance analysis of SVM classifier is compared with the algorithms presented in the literature and shown in Fig. 2.    0.9 0.9 0.9 0.9 0.9 0.9   The second segment of the experimentation is conducted over 56 undergraduate students studying Computer Science Engineering at SASTRA Deemed to be Univeristy, Thanjavur, Tamil Nadu, India. The samples are asked to take an online exam in JAVA programming with sixty questions each of which represents one of the six Bloom's Taxonomy levels. There are ten questions in each RBT category. Identification of Bloom's taxonomy level of a question has been automated through software and the questions are dispersed to the students in random manner. The duration of the test is set to Thirty minutes. The scores of the students have been retrieved in CSV file format for the analysis. Table 3 denotes set of initial centroids that are given as an input to K-means clustering algorithm for conducting the first experimental analysis. The initial centroids for the first experimental analysis are carefully chosen in such a way that it should clusters poor, medium and good level students based on overall RBT. Table 4 denotes the modified cluster centroids of k-means algorithm during the last iteration, followed by the cluster plots, Silhouette distance, and the graphical representation of cluster centroids in Fig. 3-5 respectively.
K-means algorithm creates three clusters with size 4, 14 and 38. From the results it has been observed that there are 4 students scored low marks in all RBT levels, 14 students have scored medium marks in overall RBT and 38 students have better skills in all RBT levels. Table 5 denotes the new set of initial centroids that are given as an input to K-means clustering algorithm for conducting the second experimental analysis. The initial centroids are now carefully chosen to create clusters that elucidate the weaker students in each RBT level. Hence, the number of clusters is six. Table 6 denotes the cluster centroids generated during the last iteration of K-means algorithm, followed by the cluster plots, Silhouette distance, and the graphical representation of cluster centroids in Fig. 6-8 respectively. The number of elements in each K-means cluster is 9, 8, 4, 5, 21 and 9 respectively. The cluster results show that there are 9 students lacking with remembering, 8 in understanding, 4 in applying, 5 in analyzing, 21 in evaluating and finally 9 with creating ability.

Conclusion
RBT aids the instructors to perform formative and summative assessments of students so as to map the objectives and motivation of a course in an academic curriculum. RBT is a powerful mechanism that assists students to learn at higher levels. But, the real challenge for the instructors is to make a descriptive analysis of students performance through RBT levels. There is only little/no effort been taken to group the students to find the number of students with poor, good and medium levels in all RBTs and the lacking skills of each student. This work is carried out to analyze students learning ability through predictive and descriptive analyses in data mining. Predictive analysis employs SVM classifier to obtain the RBT level of questions and descriptive analysis uses K-means clustering algorithm to group students based on learning ability. SVM classifier is able to correctly classify the RBT level of questions with the accuracy of 98% and found to be the best classifier for obtaining RBT levels of questions. The results of Kmeans clustering correctly group poor, good and medium level students and the lacking skill of each student with precision values 0.83 and 0.79. The cluster results help instructor to use different teaching methodologies and follow ups to ensure better learning of students.