A Combined Approach to Improve Supervised E-Learning using Multi-Sensor Student Engagement Analysis

: E-learning provides an important means of education which can reach masses irrespective of their locations all over the world. The E-learning systems and platforms have evolved over the years, but E-learning methodologies are still lagging in matching the benefits of teacher-student interaction in a classroom. The absence of human supervision is always a concern as a student cannot be monitored for losing interest or not getting engaged in the e-learning session. Given this problem, this research was carried out in two phases, first to identify a solution which can augment the emotional and mental state of the student to a feedback system and second, use the feedback to change the content as per learner’s level of engagement or interest. The findings presented in this study relates to the first phase of the research. A novel methodology was used to use three types of measurements to assess the interest or engagement of the student during an E-learning session. These measurements were carried out using Facial recognition based engagement analysis, Electro Dermal Activity (EDA) data and pulse rate information. Facial recognition was carried out to infer interest level from the student’s facial expressions and was used as a reference to find correlation with EDA and pulse rate. A single timeline was used to carry out all these three mode of measurements. Statistical correlation results showed that all the three modes of measurements exhibit significant correlation between them and thus these can be effectively used together to ascertain the engagement or interest of the student in an E-learning session. These findings will help in improving the efficacy of E-learning environment by altering the content structure and visual presentation as per learner’s learning curve.


Introduction
Since the advent of E-learning technology, around 3 million students are currently taking E-learning classes in higher education institutions in the United States (Symonds, 2001). Still E-learning poses a lot of challenges, one being the unknown set of parameters which define the overall construct of online environment for the students. Learning online is not the same as learning in a real world classroom, face to face with the teacher. 'The Internet is a mask of sorts. It hides the color of our skin, the shape and size of our body, its beauty and its blemishes, our age, our accents, our incomes and our fashion sense' (Wong, 2000). The engagement of a student in an e-learning remains uncertain and ambiguous. E-learning online can be, in Mezirow's terms, 'the ultimate disorienting dilemma' (Campbell-Gibson, 2000) where it is becomes quite difficult to create real world classroom environment and human supervision, at least in approximate forms. There has been recent research into some parts of this phenomenon, much of it having to do with the effectiveness of using machine learning and adaptive methods to mimic teacher supervision in online learning. As Russell's (Russell, 1997) indicated in his study that students' performance varies with the mode of teaching and learning. He argues that maintaining 'individual changes in learning styles direct technology to facilitate some individuals, but will affect learning curve of the other students. Other research problems in this area claim that the research focuses on the technology, resource efficiency, content delivery and pedagogy, with little exploration of the student experience and the implications (Cookson, 2000). Research into the student emotions associated with e-learning has been recently reported in literature. Kort et al. (2001), for example, are attempting to develop a model of emotion depicting various phases of learning. These authors have identified a range of emotional states and their objective was to devise a computer-based intelligent system which can augment the student's emotional position in relation to learning.
Electro dermal activity provides insight in to emotional responses and is conventionally measured via Galvanic Skin Response (GSR) probes. Extensive academic research on EDA measurements have brought into market products like smart EDA sensors, These sensors provide invaluable information about the emotional states of the user to some extent by measuring the Electro dermal activity on the skin of the user. Qsensor is one such sensor which has shown potential in providing useful information about the emotional state of the user. EDA information can be useful in a variety of situations where emotions play an important role; especially in e-learning where direct supervision of instructor is not available to monitor student's attitude and engagement in class. Due to the immense information embedded in EDA data, recording this data from students can offer an important advantage in interactive e-learning systems. Interactive e-learning systems require feedback from the learner constantly to adapt the content and delivery as per the learner's requirements. It can act as an important tool to measure the engagement of the learner using EDA data.
Similarly, facial recognition holds significant potential for interactive digital entertainment, education and training. By creating an effective analysis of the facial based emotion responses i.e., face expression along with pulse rate in the interactive learning environments could enable the educators to create a customized expression index that are dynamically can be used to test the changing levels of engagement, interest and emotional state.
Emotional responses measured via EDA, Facial expression index and pulse rate are directly triggered by changes in affect, biofeedback data such as heart rate and face expressions response can be used to infer affective changes.
The purpose of this research was to develop a combined methodology employing pulse rate, facial expression based engagement and EDA data of a student in an unsupervised e-learning environment. The goal is to assess whether these three functions can provide a better estimate of a student's engagement in an elearning environment.

Background
Numerous studies have been carried out on Elearning where importance on learner's engagement has at least been emphasized. Martinez (2001) also carried out research into online learning and developed a model which 'recognizes a dominant influence of emotions, intentions and social factors on how individuals learn differently'. Schaller et al. (2002) showed that students experienced confusion and bewilderment both when they went through an e-learning session. Wegerif (1998) submitted that students of the Open University exhibited fear and alienation feelings during easy and difficult sessions of e-learning. Ng (2001) discovered that some students studying online faced difficulty in communicating electronically, as they were not accustomed to unsupervised e-learning. Hara and Kling (2000) investigated in detail the students' negative emotions associated with studying an e-learning course. In their study, isolation was not reported as the main issue, however; they did point out frustration with the technical aspects of leaning online, experienced by students.
In the past few years, a number of wireless wearable adrenergic sensors came into being that changed the landscape. The BioHarness BT (Zephyr Technologies, Annapolis, MD) that is worn as a strap around the thorax, measuring pulse and breathing function, is one among many wireless options for performing adrenergic measurements in user studies. Cholinergic sensing lagged behind in this respect, until the appearance of the Q Sensor in 2010 (Affectiva, Waltham, MA). The Q Sensor was the first wireless wearable Electrodermal Activity (EDA) sensor, designed to measure perspiratory responses on the palm or the wrist.
What we find lacking in the literature is a rigorous validation of the Q Sensor, as representative of this new generation of wireless and wearable EDA sensors. Poh et al. (2010) did validate the Q sensor against GSR, but they did this only for the wrist placement and using an array of stimuli that are not considered baseline in sympathetic studies. A universal arousal stimulus, which is widely perceived to invoke a threshold response for the majority of healthy subjects is startle. With the recent advances in biosensor technology, small, wireless and non-intrusive sensors are becoming available as commercial products.
In the recent years, e-learning has progressed from Intelligent Tutoring System using Computer Aided Instruction, to Smart Classrooms and Mobile e-learning. Currently, e-learning is heavily learner-centered with emphasis on personalized and pervasive learning technologies. Also known as ambient or ubiquitous learning, pervasive learning is a type of learning that is available anytime anywhere. A viable e-learning platform should not only deliver good learning outcomes, but should also facilitate and engage the learners as per their learning potential and capabilities.
The influence of emotions in engaged e-learning is an open research area which requires much attention.
Recently, numerous studies (Jonassen, 2004;Mager, 1997;Merrill, 1987) have been published discussing and explaining the importance of emotion to numerous learning behaviors especially in e-learning environments. Research have shown that complex set of parameters surrounding online learning environment and the learner can reveal the importance of the emotional states of learners and its link to effective learning . Previous research (Merrill, 1987) has demonstrated that a slight positive mood which make a person feel a little better on the sidelines induces another kind of thinking tendency oriented towards greater creativity and notable improvement in problem solving, as well as apprise him with an articulate understanding of decision making. These previous findings underscore the important effects of emotions on learning and the main logic behind this relationship is that the human brain is not just as an information processing system with cognitive abilities, but a system in which both cognitive and affective functions are intricately integrated.
Several new e-learning technologies have been adopted in the recent years by e-learning specialists for improving the effectiveness of engaged e-learning. Nowadays, learners are often used to e-learning on web synchronously and asynchronously in a distributed setting (Kear, 2004). More ubiquitous and personalized learning environments are now becoming a necessity (Muntean and Muntean, 2007). Recent developments in input devices (such as webcams and mics) can be actively harnessed to facilitate learner's active interaction in engaged e-learning. Similar devices can record image and sound data for technical analysis to initiate improved interactions with a learner using an elearning application in real-time.
Secondly, in this way useful user data can be collected without interfering with the learning process like questionnaires often do. This is due to unobtrusive and continuously nature of data gathering in digital devices. Existing methods of recording feedback can hamper learning as well as create issues in the delivery of e-learning. Software developed for engaged e-learning systems uses learner emotional and physiological attributes through various analog and digital sensors. Emotion recognition (Jiao and Wang, 2010) has a couple of limitations that restrict their application context and degrade their accuracy. Application context is bounded by the fact that current emotion recognition software can only manage a small set of expressions from various views requiring multiple visual sensors faces (Ashwin et al., 2015). Moreover, such software requires post processing for analyzing visual data and facial features are difficult to resolve on various timescales. In addition, the emotion recognition accuracy is degraded as limited past data is available (Zhang et al., 2007;Wang and Niu, 2012). This problem is complemented with the uniqueness of feature sets because they vary from person to person.
Investigation and research on the emotion recognition online in real-time is an active area of research. It is imperative to increase the accuracy of face emotion recognition software to circumvent learner engagement issues by employing feedback based on emotional states. This research will enhance the learners' as well as instructor awareness of a learner's behavior. Hence, automated emotion detection may compensate for the missing human supervision required for interactive teaching in an e-learning environment.

Methodology
Skin conductance measurement can be of two types i.e., phasic and tonic; which can roughly be thought of as "the quickly changing peaks" Vs. "the smooth underlying slowly-changing levels".
Phasic-Phasic changes are transduced as "peaks" in the skin conductance, or simply abrupt increases in the skin conductance. These peaks are reported in literature as Skin Conductance Responses (SCRs). Phasic skin conductance measurements are usually related with short-term events and occur due to changes in discrete environmental stimuli such as sight, smell, sound, or cognitive processes that may lead to events related to anticipation and decision making, etc.
Tonic-Tonic changes in the skin conductance level usually spans over tens of seconds or minutes. Tonic skin conductance is reported in literature as the skin conductance level that is sensed in the absence of any particular external stimuli or discrete environmental event and is generally referred to as Skin Conductance Level (SCL). Tonic skin conductance level can take time to change and it slowly varies in an individual depending upon his or her psychological state, skin dryness and hydration.
The purpose of analyzing EDA, expression and pulse rate sensor data is to accurately capture the features of physiological data that are relevant to the individual and/or to the occurrences of particular behavior. Sensor data collected in the real world, however, often contain outlier data, noise, or invalid data (e.g., because of movements of a sensor wearer or disconnection of the sensor). Identifying these instances and removing them from the data is an important step to improve the quality of physiological sensor data.
Furthermore, finding an appropriate parameter setting from the analysis algorithm that can best capture the features of physiological data of each individual is crucial because each person often has different physiological responses.
At the beginning of the experiment, the participants were asked to complete an informed consent form and a questionnaire. The questionnaire was used to record the participants' emotional states and their personality traits. The questionnaire was developed using psychometric questions with an aim to understand mental state of the participant students prior to the experiment. 20 students were selected randomly from a computer course to participate in this study. The aim was to simultaneously record facial expressions, EDA data and pulse rate during an e-learning session.
The E-learning session was held consisting of a presentation on Liner Algebra. All the slides and inside text was calibrated in time to allow for correlation with EDA data and pulse rate measurements. During the course of presentation, more than 1500 snapshots of the participants face expressions were taken. Around 200 of them were eliminated, while the others were classified into following categories: For all subjects, the Facial expression index model produced the expected results: The facial data was decomposed into three signals, the prior event frame, current event frame and post event frame.
The features extracted from each frame showed a strong significant statistical continuity based on the emotional event.
The students were asked to relax and q-sensor measurements were noted until the recordings became stable. The data was recorded for the whole 12 min session and each event in the presentation was synchronized to occur in a serial manner after every 10 sec. This method of recording data allowed us to understand the EDA data and Expression index with reference to the e-learning material i.e., presentations. Last three slides were used to present questions related to the content and one minute time was given for writing the solution. The purpose of presenting questions was to investigate whether stress levels induce any significant change in the Electro Dermal Activity data.
Q-sensor/facial readings/pulse rate were synchronized in time using a single time base to allow for event correlation, with measurement taken every 0.1 sec.

Move up and down
Presentation based instructional lesson was given to the students in animation/storyboard format.
The most difficult data to analyze was EDA data. This was addressed using a technique based on convex optimization, a method that has been applied a number of applications (Boyd and Vandenberghe, 2004). This approach allowed the EDA data to be fitted efficiently and whose solution provide an accurate representation of tonic and phasic changes in the emotional rate of change of the student.
A set K Є Rn is convex if: This inequality shows that is that, for any two points x and y in the function domain of the selected data the region in between (x; f(x)) and (y; f(y)) is located above the graph of the function. Equivalently, we can develop a convex function whose epigraph is a convex set (Boyd and Vandenberghe, 2004).
For a standard optimization problem: minimize f0(x) subj. to fi(x) ≤ 0 i = 1,...,n,m It is imperative to minimize the given objective function f0(x), which represents the cost of choosing x, while simultaneously satisfying the constraints fi(x)≤0. An optimization problem where both the objective and the constraint functions are convex can be fitted using convex optimization. In the context of mathematical optimization, the most important consequence of convexity is that necessary conditions for local optimality are also sufficient for global optimality. Moreover, important categories of convex optimization problems can be solved efficiently (this is rarely the case for general non-convex problems).
Optimized data was clustered into three sets of time periods over the time scale. The time periods were selected after subclass convex optimization and were found to exhibit most significant correlation linear trends graphically.
A subclass of convex optimization can be defined using least-square problems where the aim is to minimize the quadratic objective function.
This special subclass of convex optimization problems normally arise in regression analysis, parameter estimation and data fitting methods (Poh et al., 2010).
Bivariate correlation analysis was then carried out for three cases i.e., EDA Vs Heart Rate, EDA Vs Engagement, Heart Rate Vs Engagement. The Pearson correlation coefficient and significance values were calculated using SPSS for all of these three cases. Two hypothesis were tested for all the three cases. First hypothesis as H0, that there is no correlation in the population of the two variables for each case and second hypothesis as H1, which states that a correlation exists. Statistical correlation parameters estimated using Bivariate correlation analysis were used to test both of these hypotheses. i.e., the null hypothesis of absence of linear correlation present between the two variables against the condition that a significant correlation exists.

A presentation 'Solving Linear Equations with One
Variable' was used in this experimental setting.
For analysis purpose, averaging was carried out over 10 sec and using 100 samples to get a realistic EDA value with deterministic attributes. Matlab was used for acquiring EDA samples every 10 msec and averaging was carried out over 10 sec intervals for all the samples recorded in these 10 sec. Main reason behind this time period was that the presentation content was delivered in an automated manner in which every line on each slide was displayed for 5-10 sec and every slide was of duration between 2-5 min.
Overall, the presentation on linear equations consisted of 8 slides.
1st slide was formulated to provide a basic intro the linear equations having one variable. 2nd slide discussed how a single variable linear equation can be solved and how the solution can be confirmed.
The purpose of these two slides was to provide an introductory primer to the students. As this presentation was very basic and simple for an undergraduate student, probability was distributed in three dimensions: The 3rd and 4th slide was designed to provide information on additive and multiplicative properties of single variable linear equations with one relevant example each.
The 5th slide of the presentation consisted of an example which used both addition and multiplication properties for solving single variable linear equations.
The 6th slide consisted of a test question for which 5 min were allocated.
The 7th slide was another test question with a slightly increased difficulty level by introducing a fraction based single variable linear equation. Again 5 min were allocated for solving this problem.
The last slide or 8th slide consisted of another question but will increased difficulty level. The students were allotted 2 min to write the solution for the problem.
Aim of this experiment was to investigate usability aspects of Q Sensor particularly, for mild stressors. Investigation was formulated to answer two fundamental inquires. First, how effective is the Q Sensor measurement in measuring the engagement of the user. Second, do the Q-sensor measurements have a correlation with facial expression based engagement and heart rate based analysis. 20 students were selected and two students each were subjected to a learning session. Each session was around 25 min long and whole experiment was conducted in 2 days' time and in 5 sessions.
The content used to develop the presentation was delivered using storyboard style. Every slide took around 2-5 min and included questions at the end. Difficulty level of the slides was intentionally varied to measure the responses of the students. This was done with an aim to trigger stress events and see how EDA data varies accordingly. All the students were kept blind about the nature of the content. Facial expressions, heart rate monitoring and q-sensor recording was carried out in definite intervals. About 10 sec were kept as the standard interval and all recorded data was averaged over a span of 10 sec to ensure a common time base. The results collected during this time were analyzed after normalization and averaging. Cumulative averaging was carried out on a single time base to allow for identifying correlation characteristics for all the three type of measurements. The results are shown in Fig. 1.
Three set of bivariate correlations were calculated i.e., EDA Vs Heart Rate, EDA Vs Engagement, Heart Rate Vs Engagement. The Pearson correlation coefficient and significance values were calculated using SPSS for all of these cases and are presented in Table 3 and 4 respectively.
The Pearson coefficient calculated for the EDA, Engagement and Heart Rate showed a linear correlation between the two variables for all the three cases. However, a significance test has to be carried out to deduce whether there is any or no evidence to suggest that linear correlation is present between the EDA, Engagement and Heart Rate data over a single time base scale.
The p-value for EDA Vs Heart Rate, EDA Vs Engagement and Heart Rate Vs Engagement was found to be 0.018, 0.011 and 0.009 respectively. Similarly the correlation coefficient for the three cases was 0.832, 0.871 and 0.903 respectively. Thus it can be deduced from the correlation coefficient and significance values the 2nd hypothesis H1 holds true. A Pearson's correlation was run to determine the relationship between 14 females' Hb and PCV values. There was a very strong, positive correlation between Hb and PCV (r = 0.88, N = 14, p<0.001).
The three experiments performed were selected to capture a variety of EDA responses to classic stimuli. Furthermore, the cognitive experiment consisted of three separate tasks (three slides at the end of the presentation comprising of questions) of differing complexity (mental arithmetic task, mental memory task and mental logical task), while the emotional experiment also caused a defensive response prior to providing emotional stimuli during the presentation. It is well known that stress can induce an increase in both tonic and phasic components of EDA. Our measured EDA recordings showed changes of skin conductance as the stress index was increased or decreased. As expected, skin conductance increased during physical strain, induced by emotional strain.

Discussion
All data files were analyzed using custom software written in MATLAB (The MathWorks, Inc.) based on convex optimization (Boyd and Vandenberghe, 2004).
The research method included a statistical correlation between the optimized data from the three variables recorded over a single time base i.e., EDA, Heart rate and Engagement.
The raw EDA signals were filtered with a 2048-point low-pass filter to reduce artifacts and electrical noise from the sensors. Pearson's correlation coefficients and the corresponding p-values were calculated for the filtered recordings from the different sites and systems as a measure of similarity between signals.
To filter noise or detect features within physiological signals, it is often effective to encode expert knowledge into a model such as a machine learning classifier. However, training such a model can require much effort on the part of the researcher; this often takes the form of manually labeling portions of signal needed to represent the concept being trained. Active learning is a method for decreasing human effort by using a classifier that can intelligently select the most relevant data and refines the data in an iterative process.
The Following important events were noted which provided high correlation as shown in Fig. 1. 1. 9:00:00 to 9:02:30 showed two events of increasing heart rate which correlated with both the EDA data and facial expression based engagement. This was attributed to the student's anxiety and subsequent loss of interest as the 1st slide was displayed. It consisted of very basic information on linear equations and made students lose interest. EDA data and engagement showed a negative trend at this event while heart rate was increased.
2. Next correlation event was recorded at 5th and 8th min, as these were the instants when addition and multiplication properties of the single variable linear equations were discussed. These events are attributed to the fact that undergraduate students were well aware of these properties, so a negative trend for the EDA data and engagement was observed. As a consequence, the heart rate increased. This showed that heart rate increased for both the cases i.e., when student is more engaged or less engaged.
3. The 3rd significant correlation event was recorded in the 10th min, which comprised of a slide showing the method to solve a single variable linear equation. This solution required attention and interest for understanding and as students were told prior to the presentation about the end questions (kept blind about the content), they knew that this slide was a premise to the end questions. Heart rate was again increased in this event as well as the facial expression based engagement and EDA data due to the increased interest of the students.
Based on the correlation events from biophysical signals, the results in this study explored how multi-sensor measurements can be used to understand emotional feedback, in this case student level of engagement which can in turn be used to improve e-learning experiences even in the absence of a human supervision.

Conclusion
In this study, we presented one of the first Combined Emotion analysis studies in which student's electro dermal activity, facial expressions and pulse rate were simultaneously measured in an e-learning environment comprising of 20 students.
Heart rate variation and electro dermal activity were both affected by the autonomic nervous system's response to psychological and emotional activity. Moreover the facial expressions based index showed a high correlation with the EDA and pulse data rate of change.
Three different modes of measurements were taken over a single time base and data was refined using convex optimization to allow for a better classification of data for later correlation.
Results showed that all the three modes of measurements showed a high correlation and can be used to define the engagement of the student. Understanding a student engagement in the class is the prime indicator of his interest in the class. Therefore even in the absence of direct supervisions, these three sensors can provide adequate data for the e-learning system to react adaptively.
If a student is bored or not engaged, the e-learning program can automatically increase volume, relay a short interesting information or anything to get the student engaged.
This research can provide a strong basis for implementation of an interactive and smart e-learning system which can supervise the students based on the data provided by electro dermal activity, facial expression index and pulse rate, given that these modes of measurements showed a strong correlation and can provide an accurate information on whether a student is engaged with minimal probability of error. The results confirmed the statistically significant correlation and efficacy of these sensors to depict a student emotional state undergoing emotional changes during a e-learning session.

Funding Information
The authors have no support or funding to report.

Author's Contributions
The main contribution of this research was to ascertain the correlation between face expression, heart pulse and EDA data in the interactive learning environments. Aim was to analyze the efficacy of using all these three techniques to understand and approximate the engagement of a student during an E-learning lesson.

Ethics
The Author confirms that this work is original and has not been published elsewhere.