Discrete Wavelet Transform Based Classification of Human Emotions Using Electroencephalogram Signals

: Problem statement: The aim of this study was to report the human emotion assessment using Electroencephalogram (EEG). Approach: An audio-visual induction based protocol was designed for inducing five different emotions (happy, surprise, fear, disgust and neutral) on 20 subjects in the age group of 19~39 years. EEG signals are recorded from 64 channels placed over entire scalp according to International 10-10 system. We firstly applied Spatial Filtering technique to remove the noises and artifacts from the EEG signals. Three wavelet functions (“db8”, “sym8” and “coif5”) were used to decompose the EEG signal into five different frequency bands namely: delta, theta, alpha, beta and gamma. A set of new statistical features related to energy were extracted from the EEG frequency bands to construct the feature vector for classifying the emotions. Two simple linear classifiers (K Nearest Neighbor (KNN) and Linear Discriminant Analysis (LDA)) were used for mapping the feature vector into corresponding emotions. Furthermore, we compared the efficacy of emotion classification with a reduced set of channels (24 channels) for evaluating the reliability of the emotion recognition system. Results: In this study, 62 channels outperform 24 channels by giving the maximum average classification accuracy of 79.65% using KNN and 78.52% using LDA. Conclusion: In this study we presented an approach to discrete emotion recognition based on the processing of EEG signals. The preliminary results resented in this study address the classifiability of human emotions using original and reduced set of EEG channels. The results presented in this study indicated that, statistical features extracted from time-frequency analysis (wavelet transform) works well in the context of discrete emotion classification.


INTRODUCTION
The information retrieval from the brain signals related to human emotion is one of the key steps towards emotional intelligence. Our emotional state plays an important role in how we experience and interact with our environment. Emotion directly affects our decision making, perception, cognition, creativity, attention, reasoning and memory (Levis, 1995). Manifestations of emotional states are normally straightforward to detect and understand by humans, as these are reflected in both voice and body languages (Adler and Rodman, 2003;Pease and Pease, 2004). In recent years, the research efforts in Human Computer Interaction (HCI) are focused on empowering computers to understand human emotions. Most of the efforts have been dedicated to the design of userfriendly and ergonomic systems by means of innovative interfaces such as voice, vision and gestures. Many literature works have been reported on emotion recognition using facial expressions and speech modalities (Daabaj, 2002;Chen and Huang, 2000;Massaro, 2000;Hongo et al., (2000). These conventional methods of assessing emotions thorough the speech and the facial expressions of a subject are purposefully expressed and it can be more easily concealed by other subjects (Takahashi, 2004). Indeed, it cannot be used for those people who have suffered from severe motor disabilities, amyotrophic lateral sclerosis, paralysis and introverted characters.
Various methodologies based on a number of sensors such as peripheral signals (Electrocardiogram (ECG), Electromyogram (EMG), Skin Conductance Resistance (SCR), Skin Temperature (ST), Heart Rate (HR) and Respiration Rate (RR)), the Electroencephalogram (EEG) signals and fusion of peripheral and EEG signals have been documented by experts around the world (Wagner et al., 2005;Wang and Guan, 2005;Takahashi and Tsukaguchi, 2004;Rong et al., 2008). We believe that, the physiological signals based human emotion recognition is a more natural means of assessing human emotions, in that the emotional status is inherently reflected in the activity of various physiological signals. Furthermore, the physiological response of individual subjects will not be concealed by the physiological responses of other subjects. Compared to all these physiological signals, EEG plays a major role on detecting the emotion directly from the brain at higher temporal and spatial resolution. Furthermore, the brain activity is naturally expected to precede the muscular and vascular activities. Several approaches have been reported by different researchers on finding the correlation between the emotional changes and EEG signals (Takahashi and Tsukaguchi, 2004;Chanel et al., 2006). More details on the automatic emotion recognition using physiological signals and EEG as well as more complete list of reference can be found in (Murugappan et al., 2009a).
In this study, we have used audio-visual stimuli for evoking five discrete emotions. A set of statistical features related to energy are derived using wavelet transform over five different frequency bands. Three wavelet functions namely: "db8", "sym8" and "coif5" are used to decompose the EEG signal into five different frequency ranges. These statistical features are classified using two simple linear classifiers called KNN and LDA.

EEG data acquisition:
The acquisition of EEG signals for emotion assessment experiments is described. Emotions can be induced by one of the following ways: (a) visual (images/pictures) (Wang and Guan, 2005) (b) audio-visual (film clips/video clips) (Takahashi and Tsukaguchi, 2004) (c) recalling of past emotional events (Chanel et al., 2006) (d) audio (songs/sounds) (Jonghwa and Elisabeth, 2008). Most of researchers are using visual stimuli for evoking emotions. In our previous study, we have used both visual and audiovisual stimuli for evoking discrete emotions. The result of this study confirms that, audio-visual stimulus performs superior in evoking emotions than visual stimulus (Murugappan et al., 2009b). The main advantage of this method resides in the strong correlation between induced emotional states and the physiological responses. Hence, we have designed an audio-visual induction based protocol for eliciting the discrete emotions in this present study. The structural overview of emotion recognition system using EEG signals is shown in Fig. 1. A pilot panel study is conducted on 25 university students to select any 5 video clips (trials) for each emotion from 115 emotional video clips including from the international standard emotional clips (www.stanford.edu). All the video clips are short in time duration and with more dynamic emotional content. The selection of video clips is based on self assessment questionnaires mentioned in (Murugappan et al., 2008). The subjects who have undergone for this panel study does not take part in the data collection experiment. The audio-visual stimulus protocol for one of the trials of our experiment is shown in Fig. 2. The orders of the emotional video clips are changed in a random manner for other trials. X1-X5 denote time periods of selected video clips. The time duration of video clips vary from one another.
Between each emotional stimulus (video clips), a blank screen is shown for 10 sec duration to bring the subject to their normal state and to experience a calm mind. Three females and seventeen males in the age group of 21-39 years were employed as subjects in our experiment. Once the consent forms were filled-up, the subjects were given a simple introduction about the research work and the various stages of experiment. Hz. There are totally 62 active electrodes plus one electrode for ground (Oz) and one for reference (AFz). In addition, we collected the recording of eye blink rate by two Electroocculogram (EOG) electrodes (EOG L and EOG R ), which are placed above the right and left eyes of the subjects. A set of electrodes used for this emotion recognition study is shown in Fig. 3. All the electrodes are placed over the entire scalp using International standard 10-10 system (Bocker et al., 1994). The reduced sets of 24 channels are highlighted in Fig. 3. The impedance of the electrodes is kept below 5 kΩ. Between each emotional video clips, under self assessment section, the subjects were informed to answer the emotions they have experienced (Murugappan et al., 2009b). Finally, 5 trials for disgust, happy and surprise emotions and 4 trials for fear and neutral emotions are considered for further analysis. All the signals are collected without much discomfort to the subjects.

Preprocessing:
The recorded EEG signals are usually contaminated with noises (due to power line fluctuations and due to external interferences) and artifacts (Ocular (Electroocculogram), Muscular (Electromyogram), Vascular (Electrocardiogram) and Gloss kinetic artifacts). The complete removal of artifacts will also remove some of the useful information of EEG signals. This is one of the reasons why considerable experience is required to interpret EEGs clinically (Jung, 2000;Gott et al., 1984).

Fig .4: Proposed procedure for emotion recognition
A couple of methods are available in the literature to avoid artifacts in EEG recordings. However, removing artifacts entirely is impossible in the existing data acquisition process. The research methodology of emotion recognition using EEG is shown Fig. 4. In this study, we used Surface Laplacian (SL) filter for removing the noises and artifacts. The SL filter is used to emphasize the electric activities that are spatially close to a recording electrode, filtering out those that might have an origin outside the skull. In addition, it also attenuates the EEG activity which is common to all involved channels in order to improve the spatial resolution of the recorded signal. The neural activities generated by the brain, however, contain various spatial frequencies. Potentially useful information from the middle frequencies may be filtered out by the analytical Laplacian filters. Hence, the signal "pattern" derived from SL filters is similar to "spatial distribution of source in the head".
The mathematical modeling of Surface Laplacian filter is given as: Where: X new = Filtered signal X(t) = Raw signal N = Number of neighborhood electrodes Feature extraction: EEG signals are often quantified based on their frequency domain characteristics. Typically the spectrum is estimated using Fast Fourier Transform (FFT). A fundamental requirement in the FFT based spectral analysis is that the signal is to be stationary. Indeed, the EEG signals cannot be considered as stationary even under short time duration, since it can exhibit considerable short-term nonstationarities (Mallat, 1989). In the EEG based emotion recognition research, the non-parametric method of feature extraction based on multi-resolution analysis of Wavelet Transform (WT) is quite new. The joint time-frequency resolution obtained by WT makes it a good candidate for the extraction of details as well as approximations of the signal which cannot be obtained either by Fast Fourier Transform (FFT) or by Short Time Fourier Transform (STFT) (Mallat, 1989;Merzagora et al., 2006). The non-stationary nature of EEG signals allow us to expand basis functions created by expanding, contracting and shifting a single prototype function (Ψ a,b , the mother wavelet), specifically selected for the signal under consideration. The mother wavelet function Ψ a, b (t) is given as: where, a, b ∈ R, a>0 and R is the wavelet space. Parameters 'a' and 'b' are the scaling factor and the shifting factor respectively. The only limitation for choosing a prototype function as mother wavelet is to satisfy the admissibility condition (Eq. 3): where, ψ(ω) is the Fourier transform of ψ a,b (t).
The time-frequency representation is performed by repeatedly filtering the signal with a pair of filters that cut the frequency domain in the middle. Specifically, the discrete wavelet transform decomposes the signal into an Approximation Coefficients (AC) and Detailed Coefficients (DC). The approximation coefficient can be subsequently divided into new approximation and detailed coefficients. This process can be carried out iteratively producing a set of approximation coefficients and detail coefficients at different levels of decomposition (Merzagora et al., 2006).
In this study, four different wavelet functions: "db8", "sym8" and "coif5" are used for decomposing the EEG signals into five different frequency bands (delta, theta, alpha, beta and gamma). These wavelet functions are chosen due to their near optimal timefrequency localization properties. Moreover, the waveforms of these wavelets are similar to the waveforms to be detected in the EEG signal. Therefore, extraction of EEG signals features are more likely to be successful (Charles and Zlatko, 1997). In Table 1 In order to analyze the characteristic natures of different EEG patterns, we proposed one energy based feature called Recoursing Energy Efficiency (REE) in our earlier study (Hongo et al., 2000). In that study, we used the Fuzzy C Means (FCM) and Fuzzy K-Means (FKM) for clustering the human emotions. The equation for deriving REE for five frequency bands is given in Eq. 4-9. In this present study, we used the same feature and two modified form of REE namely Logarithmic REE (LREE) and Absolute Logarithmic REE (ALREE) for classifying emotions using two linear classifiers. The equations for calculating LREE and ALREE for theta band is given in Eq. 10 and 11, similarly the remaining frequency bands can be derived. These features are derived from the five frequency bands of EEG.
Classification: In this study, we have employed two simple classifiers such as Linear Discriminate Analysis (LDA) and K Nearest Neighbor (KNN) for classifying the discrete emotions. Classification accuracy, representing the percentage of correctly classified instances, has been adopted to quantify the performance of KNN and LDA.
K nearest neighbor: KNN is also a simple and intuitive method of classifier used by many researchers typically for classifying the signals and images. This classifier makes a decision on comparing a new labeled sample (testing data) with the baseline data (training data). In general, for a given unlabeled time series X, the KNN rule finds the K "closest" (neighborhood) labeled time series in the training data set and assigns X to the class that appears most frequently in the neighborhood of k time series. There are two main schemes or decision rules in KNN algorithm, the similarity voting scheme and majority voting scheme (Wanpracha et al., 2007).
In our study, we used the majority voting for classifying the unlabeled data. It means that, a class (category) gets one vote, for each instance, of that class in a set of K neighborhood samples. Then, the new data sample is classified to the class with the highest amount of votes. This majority voting is more commonly used because it is less sensitive to outliers. However, in KNN, we need to specify the value of "K" closest neighbor for emotions classification. In this experiment, we try different "K" values ranging from 1-6.
Linear Discriminate Analysis (LDA): Among these two classifiers, LDA provides extremely fast evaluations of unknown inputs performed by the calculations of distances between a new sample and mean of training data samples in each class weighed by their covariance matrices. Indeed, LDA is of very simple but elegant approach to classify various emotions. A linear discriminate analysis tries to find an optimal hyper plane to separate five classes (here, disgust, happy, surprise, fear and neutral emotions). Besides the training and testing samples, LDA does not require any external parameters for classifying the discrete emotions.

RESULTS AND DISCUSSION
Among all twenty subjects, we sample and preprocess the total of 460 EEG epochs from five discrete emotions. The number of data points in each epoch depends on the time duration of video clips. In our experiment; the time duration of video clips vary from one another. Most of the stimuli for inducing "Neutral", "Fear" and "Disgust" emotions are collected from Stanford university database. The next stage is to train the KNN classifier with a best value of K while LDA classifier directly works for classifying the emotions. The classification ability of a statistical feature set can be measured through classification accuracy by averaging five times over a 5 fold crossvalidation. The basic stages of 5 fold cross-validation includes: (a) total number of samples are divided into 5 disjoint sets (b) 4 sets are used for training and 1 set is used for testing (c) repeat stage (b) for five times and each time the data set is permuted differently. One of the limitations on this area of research is lack of international standard database. Therefore, all the researchers are reporting their results according to their own set of data base. The accuracy of classification depends on the number of electrodes used, the stimuli used for inducing discrete emotions, method of feature extraction, number of subjects (male/female) participated, number of emotions and the statistical features used for classifying emotions.
In order to develop a reliable and efficient emotion recognition system with lesser number of electrodes, we compared the classification accuracy of the original set of channels with reduced set of channels which is used by the other researcher (Daabaj, 2002). This reduced set of channels was obtained on the basis of localizing the frequency range of emotion over different areas of the brain though cognitive analysis. From Table 2 and 3, we found that, KNN gives higher average classification accuracy than LDA on two different channels sets. The maximum classification accuracy of 79.65 and 74.52% is obtained using LREE feature on 62 channels and 24 channels respectively. Among the two different channel combination, LREE performs better than the other features (REE and ALREE). Table 4 and 5 shows the classification rate of subsets of emotion in two different set of channels using KNN and LDA respectively. From Table 4, we found that, the 62 channel EEG gives the maximum individual classification rate on four emotions (happy, surprise, fear and neutral). Similarly, 24 channel Table 2 KNN based classification of emotions using two different channel combinations.           EEG gives the maximum classification accuracy on disgust and neutral emotions. All the programming was done in MATLAB environment on a desktop computer with AMD Athlon dual core processor 2 GHz with 2 GB of random access memory.

CONCLUSION
In this study we presented an approach to discrete emotion recognition based on the processing of EEG signals. The preliminary results resented in this study address the classifiability of human emotions using original and reduced set of EEG channels. EEG signals are acquired in five different emotional states and two pattern classification methods have been adopted. The newly proposed feature, Logarithmic Recoursing Energy Efficiency (LREE) performs better over other features. Therefore the extracted features successfully capture the emotional changes of the subject through their EEG signals regardless of the user's cultural background, race and age. In addition, it also shows the significant relationship between EEG signals and emotional states experienced by the "subjects" during the interaction with audio-visual content. Most of the researchers have used multiple physiological signals for developing emotion recognition system. In this study, we have concentrated on developing unimodal system using EEG signals for assessing human emotions. The results presented in this study indicate that, the statistical features extracted from time-frequency analysis (wavelet transform) works well in the context of discrete emotion classification. This study is ongoing to involve different classification algorithms in order to track the emotional status of brain activation during audio-visual stimuli environment.