Liveness Detection from Real user, Printed Pictures and Pictures on Mobile Devices from Low Resolution Webcam

: Biometrics data have emerged as one of the most widely used technologies for validation of identity in various sectors. Nevertheless, spoof biometric data are used by attackers to get access to their targets. Hence, a number of approaches have been initiated to detect these spoofed biometric data. As such, this article proposed a complete methodology for liveness detection using low camera resolution, primarily because vast studies do rely on image quality, eyelid motion and facial expression to investigate spoof images. Nevertheless, spoof attacks cannot be diagnosed from low quality images or recorded video on mobile devices. Therefore, this paper initiates a cutting-edge technique to identify spoof attack from printed pictures, as well as videos recorded on mobile devices and built-in low resolution webcam. Moreover, by detecting the movements at the eye region and weighing these movements from a number of opted frames from recorded video, the standard deviation of these weighted movements were determined and finally, the results of these standard deviation values were compared with the priory estimated threshold values retrieved from this study. Furthermore, due to the nature of the data employed in this study, the researchers generated some data for real users by using low resolution building webcam device by recording the face images of the users on mobile device. With that, 100 various videos were used to predict the threshold value for liveness detection. As a result, this method had been successful in analysing user liveness with an accuracy of 97.6%. On top of that, further experiment is required to look into this method with bigger data set.


Introduction
The system of biometric authentication can be defined as a computer vision-based system that employsthe human body, for instance, Face, Fingerprint, Iris, DNA, Voice and/or behavioural characteristics like passwords, signatures, etc., in order to determine a particular personality to activate the authentication based on the results of the diagnosing process (Rute and Louro, 2014). Besides, past these recent years, with the technology advancement, digital biological data have emerged as a common application in various fields for assurance of critical security, for example, border control and airports banking processes. Furthermore, several other applications are associated to forensic, employee and/or student attendance, as well as internet user authentication. Therefore, the application of biometric data has become part and parcel of our lives. However, these biometric systems are exposed to various attacks that use fake biometrics information.
In fact, a good biometric system manages accurate and effective authentication access. The working diagram of a general biometric system is illustrated in Fig. 1.
Hence, as one of the many techniques of the biometric system, face recognition has been in use for almost half a century (Parmar and Mehta, 2013) with vast applications linked to authentication and personality identification (Sharma and Kaur, 2016). Nevertheless, the main challenge for face authentication and identification system is the use of false facial image, which is also known as 'spoofing attack', through the application of digital images like mobile images or printed pictures (Galbally et al., 2014). Hence, for this purpose alone, numerous algorithms have been developed to detect spoof attack. These algorithms are classified as listed in the following: Image quality-based method by Shende et al., (2014), Shende and Sarode (2016), as well as Chingovska (2015), who claimed that the image quality of a spoof image differs from the original due to the effect of reflection. With that, Wen et al. (2015) employed the feature vector of Image Distortion to extract specular reflection, blurriness, chromatic moment and colour diversity. After that, the findings were compared with user image stored in the database to identify user image and lastly, the aspect of liveness was examined by tracking the eye lid motion to detect spoof images. In fact, this method is normally used for previously registered users.
Texture-based algorithms; in which this method assumes that the real user image possesses unique texture properties, in comparison to printed images or images captured on mobile screen. With that, Maatta et al. (2012) adopted the Local Binary Patterns (LBP) to extract micro texture, which was employed to enhance image histogram, for usage in the learning algorithm for detection of user liveness. Even though this particular method offered accurate results, it failed with low texture information (Garud and Agrawal, 2016).
Motion-based method; several research studies have employed the optical flow technique to track eyelid motion (Drutarovsky and Fogelton, 2014). In addition, other approaches employ the variance of intensity between the sequenced frames based on threshold to detect blinking (Divjak and Bischof, 2009). In fact, eyelid tracking uses facial landmark (Perception and Technical, 2016). Nevertheless, the drawbacks of these approaches are that they can be strongly affected by the face position captured by camera, image resolution and blinking rate. For example, the blinking tracking approach assumes the blinking rate of the human is approximately 15 to 30 times/min, in which the duration between every two-blink is around 2 to 3 sec with a blink time at almost 205 milliseconds. Therefore, a standard camera can easily capture a face video with more than 15 frames per second, with the interval between the frames not exceeding 70 milliseconds. Next, the camera can capture two or more frames when a face looks into the camera (Garud and Agrawal, 2016). This method, nonetheless, demands the tracking of eyelid position among all frames to identify the closed eyelid status (blinking), thus seeking intensive computation process. Additionally, Polatsek (2015) asserted that computer users tend to reduce their blinking rate in front of a monitor primarily because the tear is inadequately applied on the cornea of the eyes. In turn, this might cause frailer diagnoses for user liveness. Furthermore, some implementations embed additional techniques, for example, passwords and facial expirations, to ensure uncompromised security (Patel et al., 2016).

Problem Statement
Spoof attack is a major glitch in the biometric system practices. Therefore, endless techniques have been developed to investigate the aspect of user liveness from face image, through the use of image quality-and texture-based techniques, along with biometric-acquiring equipment, by incorporating motion tracking approaches to track eye blinking or even adding farther access information, for instance, password, to distinguish the real user image from one that is false. Unfortunately, these methods can be computationally time-consuming and costly due to the use of additional sensors, thus requiring storage capability or otherwise, the quality of the image could be, eventually, strongly affected (Garud and Agrawal, 2016). Besides, the literature posits that the available iris-based liveness detection methods rely on pupil dynamics through its interaction with lighting (Czajka, 2015). On the other hand, other approaches (Galbally et al., 2012) include the diagnoses of real users using iris images based on image quality; with the assumption that spoof images (printed or on screen) have lower resolution quality.
With that, this article proposes a fast spoof attack detection technique, especially to detect user liveness image from both printed images and recorded user video from low resolution webcam. Moreover, in order to extract the boundary of features in the region of interest (the eyes region) from a number of sequenced frames, a simple mathematical-based method had been applied to identify spoof images. In fact, a plus point of this method is that it offers accurate results with varied image quality (independent on image quality), thus successful in identifying a spoof user from recorded video. On top of that, the proposed method assumes that the real computer user moved his/her iris randomly to read the content displayed by the monitor and/or to follow the mouse pointer (Rodden and Fu, 2006). Other than that, it has been assumed that the iris movement is quicker than the blinking of the eye (Czajka, 2015), which demands a motion tracking algorithm with a minimum number of frames.

Methodology
The main objective of this study is to detect spoof face image from low resolution webcam images. Eye blinking is indeed a commonly used approach to identify user liveness. Although this approach has successfully protected the system from photographs, it has failed with recorded video on mobile or tablet. Moreover, its accuracy is influenced by image resolution. Hence, this study proposes an automatic spoof attack detection of users from low resolution webcam and recorded videos on mobile device. In fact, the first stage refers to data generation, whereby this step generates a dataset of real users from low resolution webcam with varied lighting degrees (high and low) at different environments and backgrounds. Furthermore, a number of fixed images (photographs) had been selected from online free stock photos, in which all images and videos included in this study had a frontal face view with clear eyes, as illustrated in Fig. 2.
Next, the second stage involved a sequence of steps proposed to satisfy accurate detection for spoof user images, as portrayed in Fig. 3. Therefore, from the depicted problem statement and with consideration of the data characteristics employed, a quantitative-based method had been adopted to detect fake users, whereby the accuracy of the results had been tested empirically.
Besides, this study is part of a master's degree research work that is projected to develop a family protection system exclusively for internet users on personal computer with low resolution webcam, by activating the authentication of internet access based on estimation of users' age.
These images were acquired by using an ASUS built-in camera (UVC WebCam). In addition, a set of real time videos with 100 frames had been recorded to select 10 frames as input for the proposed method using the loop counter approach, which is increased by ten to reduce computation time. Next, the viola-Jones approach was employed for facial feature detection, in which an algorithm was applied to detect the face region by selecting the nearest face to the camera (Viola and Jones, 2004;Gupta and Tiwari, 2015). Later, the face area was segmented in each frame to be keyed into the viola-Jones algorithm, especially to detect the eye region. In precise, this process had successfully diagnosed the eye region accurately as in Fig. 4 (b). In addition, the results of this process were tested experimentally in all frames for all iterations in this study. In fact, the primary purpose of segmenting the eye region had been to reduce the processing time in order to resemble the real time liveness detection as shown in Fig. 4.
The data employed were comprised of video data type for real users, as well as falsely printed pictures or recorded videos on mobile devices, with the following consideration: (i) the real user movement is caused by natural human movement of head and facial features, while (ii) the movement generated by spoof images caused by hand movement of the person who holds the fake pictures.
Additionally, background subtraction is a general technique that is used to determine a foreground object in movement derived from sequence frames from video taken by a fixed camera (Singla, 2014;Philip, 2013), which demands predetermined foreground and background objects, as well as several other various approaches to identify both the foreground and the background objects in the images.
In this study, although the data had been acquired by using a built-in webcam in laptop, the object under investigation was moved and the motion of a particular part from the moving object had been identified. Moreover, as the available techniques for background subtraction have failed in providing accurate results for this case, the ROI was segmented in arrays with varied dimensions in each frame, as given in Table 1. Hence, the images of the arrays had been resized based on the biggest array, as displayed in the pseudo code presented in Fig. 5.
Other than that, the Contrast-Limited Adaptive Histogram Equalization (CLAHE) and the (4×4) Gaussian filter had been applied to improve the gray scale aspect of the images (Zuiderveld, 1994). Later, canny filter boundary detection was performed to extract all the features embedded in the ROI, as illustrated in Fig. 6.
After that, image subtraction was performed, whereby pixel-to-pixel comparison was made for the boundary features in ROI for each two sequenced frames, which resulted in 0, 1 and -1. In this case, zero represents nil change in feature (no movement), while 1 and -1 refer to particular movements in features, as shown in Fig. 7. In addition, in order to estimate the threshold values by distinguishing the movements between real and fake user, 100 iterations were performed for two types of datasets, 50 videos of real (living) users and 50 videos for spoof users (printed picture). Next, the weight of the movement (or change) for each array had been calculated, while the subtraction process was performed with Equation 1. Where; f is the number of frames taken from the webcam video; R i,j represents values 0, 1, or -1; and W f is a vector array of the ROI weight for each frame. After that, the standard deviation for the movement weighed in each video had been calculated (stdv) from Equation 2. Where; w k is the movement weight for frame k (∀ k =1,2,3… n-1) and w ̅ is the average of the values in vector W:   Video5  F1  36x145  52x210  41x164  41x163  39x156  F2  34x137  52x207  41x163  41x164  41x164  F3  34x136  51x205  40x161  41x163  40x162  F4  36x145  51x205  41x164  40x161  42x167  F5  37x147  51x205  41x165  41x164  42x169  F6  37x147  52x207  40x161  41x165  41x165  F7  37x147  51x204  40x159  40x161  40x159  F8  37x148  51x203  39x154  40x159  42x167  F9  38x152  51x204  40x161  39x154  41x162  F10  37x148  51x205  41x163  40x161  42x168  Win.dim 34 x148  51 x210  39x165  39 x165  39 x165 As a result, the average of the calculated standard deviation for the real user videos is (STDV real =53.6), whereas the spoof videos is (STDV fake =16.8) and the average of the interval between the real and the fake ones is (Av =16.9) based on Equation 3. Therefore, the threshold value is (33.7), as estimated from the average of the interval between the (rmse =17.2) values retrieved from real and fake users based on formula (4) depicted in Fig. 8:  ITERATIONS  I1  I2  I3  I4  I5  I6  I7  I8  I9  I10  I11  I12  I13  I14 where, m is the total number of iterations (m = 50 iterations), x is the standard deviation value of each video (i) and i = 1 to m.
Moving on, in order to validate the proposed method, three types of video images had been used; real user (life), printed photograph and video on mobile device. The results of calculation for each iteration are tabulated in Table 2-4 for each data type, respectively.

Results and Discussion
The total number of iterations performed for validation had been 42 iterations with 14 iterations for each data type. The preliminary results indicated 97.6% of accuracy among all data types, with only one failure in diagnosing, as highlighted in Table 1.
The findings demonstrate that the threshold had successfully distinguished between the real user and the printed pictures (spoof data). As such, one can concluded that the proposed method had been successful in diagnosing user liveness especially that derived from video on device such as Laptop copmuter.

Conclusion
Biometrics data have emerged as one of the most widely used technologies for validation of identity in various sectors. In this paper,a complete methodology for liveness detection using low camera resolution was proposed. The results show that the proposed method successfully analyze user liveness with an accuracy of 97.6%. In a future, we aim to experiment the proposed method on very big data sets.