Surface Electromyography-Based Facial Expression Recognition in Bi-Polar Configuration

Problem statement: Facial expression recognition has been improved rec ently and it has become a significant issue in diagnostic and medica l f elds, particularly in the areas of assistive technology and rehabilitation. Apart from their use fulness, there are some problems in their applications like peripheral conditions, lightening , contrast and quality of video and images. Approach: Facial Action Coding System (FACS) and some other m thods based on images or videos were applied. This study proposed two methods for r ecognizing 8 different facial expressions such as natural (rest), happiness in three conditions, ange r, rage, gesturing ‘a’ like in apple word and gestu ring no by pulling up the eyebrows based on Three-channe ls i Bi-polar configuration by SEMG. Raw signals were processed in three main steps (filtrat ion, feature extraction and active features selecti on) sequentially. Processed data was fed into Support V ec or Machine and Fuzzy C-Means classifiers for being classified into 8 facial expression groups. Results: 91.8 and 80.4% recognition ratio had been achieved for FCM and SVM respectively. Conclusion: The confirmed enough accuracy and power in this field of study and FCM showed its better abili ty and performance in comparison with SVM. It’s expected that in near future, new approaches in the frequency bandwidth of each facial gesture will provide better results.


INTRODUCTION
Facial expressions happen from motions or positions of facial muscles when they are stimulated by an internal or external hypo. It can be a visible manifestation of the state demonstrating cognitive activity, personality and psychopathology of a person (Donato et al., 1999). Face is the most visible and expressive of all the channels of communication such as emotions and facial gestures and it acts a communicative role in interpersonal relations. It is also one of the best sources of information in human's body since EMG, EOG, EEG signals can be extracted. Apart from them, facial expressions can play an important role wherever humans interact with machines. Automatic recognition of facial expressions and gestures may act as a major part of natural human-machine interfaces (Wei and Hu, 2010). There have been various methods introduced by researchers in this field of study. The first and foremost of them was invented in 1978 by P. Ekman, which based on Facial Action Coding System (FACS). Then, in 2002, a new version of FACS was published, with large contributions (Ekman et al., 2002). It is a common standard to systematically categorize the physical expression of emotions. Later, systems based on Images and video processing were introduced and applied (Cohen et al., 2003;Abboud and Davoine, 2004;Ryan et al., 2010;Liejun et al., 2009). It is a popular and easy one with very low costs and acceptable recognition rates (Buenaposada et al., 2008). However, images or videos are very sensitive and they directly depend on the quality, contrast, resolution, low light levels and movements of the user. Moreover, Image processing needs too much computation so it takes much time. Thus, it could be count as a drawback in real time control systems which speed is a vital factor.
Facial expression recognition by facial EMG signals has been considered recently (Wei and Hu, 2010;Huang et al., 2004;Firoozabadi et al., 2008 andHamedi et al., 2011). The EMG signal is a biomedical signal which measures electrical currents generated in muscles during their contraction that represents neuromuscular activities. Thus, the EMG signal is a complicated one, which is controlled by the nerve system and depending on the anatomical and physiological properties of muscles (Kale and Dudul, 2009). EMG has many applications; for instance it is clinically for the diagnosis of neurological and neuromuscular problems. Laboratory research applications are biomechanics, motor control, neuromuscular physiology, movement disorders, postural control and physical therapy (Kale and Dudul, 2009).
Bi-polar configuration SEMG through bio-sensors is an advanced method in facial gesture/expression recognition (Huang et al., 2004;Firoozabadi et al., 2008). Bio-sensors have a number of advantages against other expression/gesture recognition methods since they can be made robust against a number of environmental conditions which other methods have difficulty to overcome (Haag et al., 2004).
From another side, SEMG processing was always considered as a significant issue in control systems design especially in real-time systems. Feature extraction as well as classification can be seen as the most important parts of process which most of challenges are focused on these fields. Feature extraction techniques from bio-signals are divided into Time Domain (e.g., MAV, NMAV, MMAV, MAVSLP, SSI, VAR, RMS, WL, ZC, SSC and WAMP), Frequency Domain (e.g. AR, MDF, FR, AC, MMF, MMNF and MNF), Time-Frequency Domain (e.g., STFT, WT and BTFD) and Evaluation (PE). There are many applications of them in SEMG field (Firoozabadi et al., 2008;Rezazadeh et al., 2010;Tkach et al., 2010;Khokhar et al., 2010;Huang et al., 2005;Kiguchi et al., 2003;Fukuda et al., 2003;Vuskovic and Du, 2002;Liejun et al., 2009;Hamedi et al., 2011). Root Mean Square (RMS) and Bilinear Time-Frequency Distribution (BTFD) features are chosen in this study. RMS is applied to feed the classifier because of the comparative computational simplicity and it is based on signal amplitude . In addition, It contains enough information to represent the EMG signal sufficiently and simple enough for fast training and running of the classifier. On the other hand, since EMG signals have highly complex timefrequency characteristics, they cannot be analyzed using classical methods such as Fourier transform. Despite of these, frequency domain analysis is mostly used to study muscle fatigue and infer changes in motor unit recruitment. Classes of "bilinear time-frequency distributions" (or quadratic time-frequency distributions) are used for time-frequency signal analysis. Thus, BTFD features are elicited to investigate frequency bandwidth of each gesture. This method is used where one needs to deal with a situation where the frequency composition of EMG may be changing over time. Classification methods are also categorized into different techniques based on their abilities. Classifiers must be able to cope with such varying patterns as well as prevent over fitting. The appropriate classifier must be fast enough in order to reach real time constrains. There are many methods applied in EMG classification which mostly work based on pattern recognition such as Neural Networks, Fuzzy, Neuro-Fuzzy, Probabilistic and Online training. Many literatures emphasize the success of neural networks in myoelectric classification where MLP is used to classify time domain features (Lamounie et al., 2003), LDA performed better with time-scale features (Chu et al., 2007) and Lamounier et al. (2002) applied RBF for their purpose. There are many efforts in fuzzy approach (Ajiboye and Weir, 2005) and Evidence Accumulation (EA) method was applied by Gazzoni et al., (2004). Neuro-Fuzzy approach has also been used for EMG classification especially in machine control fields (Kiguchi et al., 2003;Ahsan et al., 2010). Then, FCNN, simplified fuzzy ARTMAP and FMMNN were introduced by  and Han et al. (2004) respectively. GMM, N-GMM, HMM and LLGMN are known as Probabilistic classifiers which were presented and applied in myoelectric classification (Huang et al., 2005;Fukuda et al., 2003). And finally, there are many experiences available in Online Training which a classifier is trained continuously using new patterns during operation, which makes the rate of accuracy stable Kato et al., 2006).
Most of EMG classifications methods which have applied in this field suffer from large number of Electrodes, sensitivity to electrode displacement, low recognition rate and perceivable delay in real time control. In this study Fuzzy c-means and Support Vector Machine classifiers are considered owing to their capability, simplicity importance and they have been widely used in processing and classification areas. Fuzzy c-means (FCM) is a method of clustering which allows a data to belong to two or more clusters. This method at first invented by Dunn in 1973 and then it was improved by Bezdek in 1981. Then, it was used by many researchers in the field of data classification and clustering (Abd Elaal et al., 2010). It is obviously useful in biomedical signal processing like in EMG applications (Ajiboye and Weir, 2005; and specifically in facial EMG classification (Rezazadeh et al., 2010;Hamedi et al., 2011). One of the most profitable properties of fuzzy logic systems is that contradictions in the data can be tolerated. Moreover, using trainable fuzzy systems makes it possible to discover patterns of data which are not easily detected by other methods, as can also be done with neural network. Fuzzy clustering has been revealed to be advantageous over crisp clustering in that a total commitment of a vector to a given class is not required in each iteration. This technique is more flexible by extending the zero-one membership to the membership in the interval [0-1]. Another useful point in this method is the formulated problem may be easier to solve computationally. This is due to the fact that a non-fuzzy model often results in an exhaustive search in a huge space (because some key variables can only take the values 0 and 1); while in a fuzzy model all the variables are continuous and derivatives can be computed to find the right direction for the search (Alata et al., 2008).
Besides that, Support Vector Machines (SVMs) are a set of related supervised learning methods which analyze data and recognize patterns and are used for statistical classification and regression analysis. This method was first and foremost developed by Vapnik and Chervonenkis in 1965 and then improved for classification and regression in 1992. Recently it is used in many applications in pattern recognition and myoelectric signal classification . Facial expression recognition SVM-based has been considered recently in such image sequences and video (Wei and Hu, 2010) as well as EMG biosignal fields (Firoozabadi et al., 2008). SVM constructs a finest separating hyper plan in high dimensional feature space of training data that are mapped by a non-linear kernel function. The power of learning and generalization is significantly increased by the use of non-linear kernel function (Firoozabadi et al., 2008). Since SVM is a binary classification method, the pair-wise classifications such as one against all or one against one should be used for multiclass classifications. Literature reviews declare that, this technique shows better results than ANN and other statistical models .
In the rest of this study, eight facial gestures are classified by FCM and SVM in order to compare the recognition ratio by extracting the RMS features.

General block diagram:
The general flow chart of the proposed method has been depicted in Fig. 1. Subject preparation and electrode placement can be seen as two first step of this project. After that, system setup and data acquisition are considered. Signal recording protocol is mentioned in next step. Then, the EMG signals which are recorded by the surface electrodes are amplified and filtered prior to process. Afterward, noise and artifacts are eliminated from the raw signal by using notch filter 48Hz-52Hz. Subsequently for processing, the acquired signal must be windowed to distinguish the active part of signal. Then, The RMS features are extracted from each window (256 msec) to feed the classifiers and BTFD features elicited for frequency band-width investigation. Prior to classification all RMSs features are passed from a threshold line to collect the active features. Finally, active RMSs are classified and clustered into eight separate facial gesture groups through FCM and SVM classifiers.  Rage (clenching molar teeth) (R) 2,3 7 Gesturing 'No' (pull up the 1 eyebrows) (N) 8 Opening the mouth like saying 2,3 'a' in 'apple' (O)

Subject preparation and electrodes placement:
Proper skin preparation is important to get a good signal and avoid artifacts. Previous to electrode placement, the selected areas which are proper for signal recording must be cleaned from any dust, sweat or fat layers to decrease the effect of motion artifacts. To acquire much clearer signal, conductive electrode paste or cream is used on the center of electrodes (grey area only) before applying them to the skin. Then again, electrode placement is another significant issue in SEMG recording. Thus, to obtain the best signals the correct position of electrodes must be considered. They must be placed on the affective muscles involved in gestures or expressions. This is due to the fact that, the nearer the electrodes are to muscle fibers, the higher the voltage amplitude produced by that muscle becomes. Affective muscles involved in considered facial gestures are Frontalis, Pars Medialis (Inner Brow Raiser), Frontalis, Pars Lateralis (Outer Brow raiser), Corrugator supercilii, Depressor supercilii (Brow Lowere), Lavator labii superioris (Upper Lip Raiser), Zygomaticus minor (Nasolabial Deepener) and Zygomaticus major (Lip Corner Puller), (Buenaposada et al., 2008) Fig. 2.
Another imperative fact in SEMG recording is a type of electrodes configuration on muscles. SEMG is divided into mono-polar and bi-polar configuration. In bipolar method which is used in this project , two electrodes are placed over the belly of the muscle within a 2 cm distance from each other and one electrode is placed somewhere farther as a reference. In this case, the signal between the two electrodes is amplified differentially respect to the reference electrode. The advantage of this configuration is that the common noise between the two electrodes is eliminated; hence, the EMG signal will be clearer.
Three recording channels are considered in this study Channel 1 is located on Frontalis muscle: above the eyebrows with 2cm inter-electrodes distance and channel 2 and 3 are placed on right and left Temporalis muscle face and one ground electrode on the boney part of the left wrist (Rezazadeh et al., 2010).
System setup, data acquisition and signal recording protocol: SEMG signals are recorded by the BioRadio 150 (Clevemed). The sampling frequency is adjusted at 960 Hz. The low cut-off frequency of the filter is chosen 0.1 Hz to avoid motion artifacts (artifact is the unwanted information contained within a signal). The most common artifact is the line interference (50/60Hz noise). It comes from the power line and is transmitted by electrical devices (such as the computer) which are placed near the EMG data acquisition device. The BioRadio 150 fixes the problem by applying a Notch filter (48-52Hz) to the signal, which will remove the 50Hz component of the signal (The choice of 50 or 60Hz depends on the power transmission frequencies used in the region). Electronic devices also generate their own frequencies that will not be removed by the Notch filter. Additional precautions must be taken, such as keeping the device 1 meter away from any electronic equipment and 3 m away from any radio transmitting devices.
In this study, SEMG signals are recorded from ten healthy volunteers (six male and four female) within the range of 21-30 of age. Before recording all volunteers rested for 1 minute. Then they were asked to perform the facial gestures according to Table 1, for 30 seconds (5 times trail performance, each of them lasting for 4 seconds and with 2 second rests between them for eliminating the effect of exhaustion). This table is also stated the effective recording channels on each gesture.

EMG signal processing:
Filtering: The acquired data from the three channels were passed through a band-passed filter within the range of 30-450 Hz in order to envelop the most essential spectrum of SEMG signal. Then, the raw signals are windowed into 200 msec to get prepared for feature extracting.
Feature extraction and active data selection: In this study, RMS and BTFD techniques are used. RMSs are selected to feed the classifiers and BTFDs are elicited for facial gestures band-width investigation. RMS is the most commonly used feature for identifying the strength of contraction of a muscle. RMS of SEMG is related to the number of active muscle fibers and the rate of activation. Theoretically, when a signal is modeled as a Gaussian random process, RMS provides the maximum likelihood estimation of amplitude in a constant force and non-fatiguing contraction. So, RMS has been extracted from each 200 msec of the SEMG signal) and then, concatenated together and fed into a classifier (1): Where: X n = Achieved signals from three channels N = Length of x n Then, the active RMSs must be selected from all features. In this step three threshold lines are calculated which are from the features of three normal channels achieved in the preceding step (T 1 , T 2 , T 3 ) (2) (Rezazadeh et al., 2010): For collecting the active features, it should compare all data RMS with the threshold lines and keep the data which are located above the lines.
As an example: To determine the active features from one of chosen gestures, after the extraction of RMSs from all channels, they must compare with their related threshold lines; channel1 RMSs (CH1) with T 1 , channel2 RMSs (CH2) with T 2 and channel3 RMSs (CH3) with T 3 . After the comparison, the triads of data set are kept only if one of them is higher than the related threshold.
Bilinear Time-Frequency Distribution (BTFD) is a technique particularly effective in analyzing non-stationary signals, whose frequency distribution and magnitude vary with time. This distribution function is mathematically similar to a generalized time-frequency representation which utilizes bilinear transformations. Comparing with other time-frequency analysis techniques, such as Short-Time Fourier Transform (STFT), the bilinear-transformation (or Quadratic Time-Frequency Distributions) may not have higher clarity for most practical signals, but it provides an alternative framework to investigate new definitions and new methods.
Each facial gesture is constructed by one or more than one muscle where each of them produces different frequency band width depends on the number of firing rate of motor units. Thus, Table 2 presents the frequency band width range of each facial gesture that proposed in this study.

SEMG signal classification:
Here, FCM and SVM classifiers are introduced and applied to classify the proposed facial gestures. So, at first their algorithm are described briefly and then comparing the results will be discussed.

Fuzzy C-means:
Step 1: For given active data set which achieved from previous step X={x 1 ,…,x n }, xi ∈ Rp , fix c ∈ {2,3,…, n-1}, m ∈ (1,∞) and initialize U(0) ∈ M fc . Which M fc is the fuzzy c-partition space for X, m is a weighting constant and c is the number of clusters.
Step 2: Compute the c-mean vectors in each iteration (3). u ik is the degree of membership of x k in the cluster i, x k is the kth of d-dimentional measured data, v i is the d-dimentional center of the cluster: Step 3:Update U = [u ik ] to next iteration (NI)(4): ik 2 m 1 c k i j 1 k j 1 u (NI) x v x v Step 4: If NI old U U − < ε, stops; otherwise, go to next iteration and step 2. Support vector machine: In this project multiclass SVM is used to classify the considered 8 facial gestures. So, it needs a classifier: where, k is the number of classes estimates the most suitable class upon the given data: x n x x , y ,..., x , y R 1,..., k ∈ × SVM is a binary classifier used to be applied as just linear classifier to classify two groups of data. But with increasing the number of given data and classes like what is done in this project optimal non-linear classification with SVM is applied. It can solve the classification problems by mapping the original data into a 'feature space'. This fact is feasible by using Kernel functions. Function ϕ(.) which maps training vector x i into a higher dimensional space, requires belonging to dot product space: Many different kernels can be used depending on data structure. In this project Radial Basis Function (RBF) is used. γ>0 is the kernel parameter: Then, these groups of data are mapped into the linearly separable space. And the hyperplanes divide them into eight labeled classes. Here, the used hyperplane separating data is the best and a unique one that yields the maximum margin of separation between the classes.
One-against-all as well as one-against-one, are two techniques which introduced in multiclass SVM classification. One-against-all method is applied to classify the data here. It is very simple in implementation, relatively fast in running and obvious; it also produces results that are often as or more accurate than other methods. Training One-against-all basically requires training k binary SVMs . Besides, an estimation for the probability of the output of a pair wise classifier between class i and j is defined by r ij.
In this study (LIB-SVM) (Chang and Lin, 2001) was applied as the main tool of SVM classifier to construct multiclass classifications using C-SVC. First and foremost, SVM's parameters were adjusted and eightfold random cross validation was used to evaluate them. As mentioned, RBF was our kernel type (γ=1/k) where k was the number of attributes in the input data and C = 1 was the cost of SVM.
Before training, the active features are shuffled in order to train the classifiers better. Meanwhile, for both FCM and SVM, 70% of data are fed to classifiers for training and 30% are left for testing them.

RESULTS
In this study it would be expected to recognize eight facial expressions and gestures with well ratio. To achieve this purpose eight different facial gestures are recorded by three channels in bi-polar configuration. After processing steps, FCM and SVM classifiers are used for classifying them in well distinguished groups. The experiment showed that, the average of recognition percentages of all the data, which were taken from all volunteers and classified through FCM and SVM, were respectively 91.8 and 80.4%. Figure 3 illustrated the distribution of each facial gesture by labeling its cluster in feature spaces. The figure confirmed the well classification of the data by FCM where five of the data groups could be seen clearly and the other two groups (Smiling with both sides and opening the mouth like saying 'a' in 'apple') had overlaps and the reason was that the muscles applied to make these gestures were almost the same.

DISCUSSION
According to the results it can be seen that, FCM classifier performs better than SVM since it has a higher recognition ratio which is due to its being more flexible. In previous studies, researchers might have got better results in facial expression recognition field based on EMG but there exist some reasons which has directly affective on these results. The most significant reasons are: Different facial gestures and number of facial gestures chosen in various applications of facial gesture-based systems. As it is expected, with increasing the numbers of facial gestures (classes), the ratio of data overlapping become higher. Therefore, the recognitions ratio becomes lower. Moreover, Table 2 indicated that, frequency bandwidth ranges of each facial gesture are very close to each other and it is impossible to classify them by applying frequency domain features. However, these bandwidths help to study muscle fatigue more precisely and infer changes in motor unit recruitment. As a result, obviously facial expression/gesture recognition based on SEMG has accurate rate. In addition, the most significant issue that makes this method distinct from others, such as image or video based, is that peripheral conditions are ineffectual on the percentage of recognition. Finally, Table 3 shows and compares different study and methods on automatic facial expression recognition based on EMG.

CONCLUSION
This study presents a method to recognize eight facial expressions base on SEMG in bi-polar configuration. SEMG signals have been acquired from the volunteer's facials. As mentioned, two classification techniques applied. Apart from time domain features which used in classifiers, time-frequency features are extracted to determine the frequency bandwidth of each facial gesture. Actually these recognition systems are mostly apply as an interface in Human Machine Interface (HMI) system to control and design artificial limbs or rehabilitation devices like wheelchair (Firoozabadi et al., 2008). In future, we are going to design an optimize HMI base on more facial gestures as input control commands and we will look for the best expressions in order to design multipurpose systems. More diverse techniques of preprocessing, processing as well as feature extraction will be employed in order to achieve better results. A new approach on frequency bandwidth will also be used to determine the new range of band pass filter in this field of study.