Biometric Gait Recognition Based on Machine Learning Algorithms

: It is crucial to find methods that analyze large amount of data captured by cameras and/or various sensors installed all around us. Machine learning becomes a prevailing tool in analyzing such data that signifies behavioral characteristics of human beings. Gait as an identifier for use in individual recognition systems has respective and almost certainly unique key features for each person including centroid, cycle length and step size. Gait is sometimes preeminent suited to recognition or surveillance scenarios. It might be used in the identification of females who are wearing veils in some countries without critical social issues. The objective of this project is to predict accurately one-dimensional coordinates of normalized n -component vectors representing two-dimensional silhouettes in order to identify individuals at a distance without any interaction and obtrusion. Varied algorithms are further incorporated into walk pattern analysis to adoptively improve gait recognitions and classification. The results are reported reasonable identification performance as compared to several machine learning methods.


Introduction
Biometrics as automated techniques are constantly used to approve identities of human beings. Most of the current biometric systems are essentially pattern recognition systems, for references see (Sayed and Jradi, 2014), (Vacca, 2007), (Raina, 2011), (Raina and Pandey, 2011) and (Jain and Aggarwal, 2012). These systems identify a person by calibrating the authenticity of specific physiological or behavioral characteristics possessed by that person. The physiological characteristics are the physical human traits in particular hand shape, facial recognition, fingerprints, iris scans, ear shapes, skin reflection, finger vein patter recognition and palm vein images, see for example (Xi et al., 2013), (Sayed, 2015a;2015b). The behavioral characteristics of human beings are the way to sign their names, manner of walk, speech patterns, lips motion and keystroke dynamics (Yang, 2010). In any biometric system, several imperative concerns such as universality, individuality, durability, collectability, presentation and satisfactoriness should be considered and appraised. Furthermore, for any biometric system to be more adequate by overall public, it is desirable to be a nonintrusive system. Gait recognition that was first introduced by Ailisto et al. (2005) is particularly considered as it enables the identification at a distance using standard cameras in any conditions. Henceforward, it is applicable to unfavorable far observation in banks, airports, military departments and homeland security. The aim of gait recognition is to discriminate an individual by analyzing his/her shape which changes over time in an image sequence. There are several gait identification approach including image and video processing based (machine learning), wearable sensors (sensor attached to the body that measures acceleration) and floor sensors (ground reaction forces based). However, gait seems to be unstable and it is subject to change when the identified person is relaxed or in hurry. In addition, diversified external factors in particular clothing, footwear, walking surface and carrying objects in the hands might affect gait recognition. It is also sensitive to the quality of gait sequence as well as the use of small dataset. In (Boyd and Little, 2005) an early study of the existing gait and quasi-gait recognition systems are categorized by their source of oscillations: Shape, joint trajectory, self-similarity and pixel.
The problem statements are summarized as follows: Capturing all relevant data by fixed cameras, filtering and transforming the data to a useful information and building an intelligent decision making process. Before using a suitable machine learning algorithm, unnecessary data should be filtered or removed along with reducing their dimensions to select some relevant features for specific application. Therefore, feature vectors that can be used for pattern recognition are extracted from individuals segmented walking after separating the background image. After being normalized, these vectors are then used to other more appropriate values. There are several dimensional reduction methods including linear and nonlinear techniques. For example, one of the most common method for linear reduction is principle component analysis, while neural network algorithms (Sayed and Baker, 2015) are most suitable for nonlinear applications. Thus, the main objective of current work is to take the advantages of machine learning algorithms that are well established in voice and facial recognitions for comparison and identification of feature vectors of a realistic gait data. Herein, we remark that there are two types of gait features: Modelbased features that involve static and dynamic of users' body parameters and model-free features that use the dark shape and outline of users. Interested readers are referred to (Liu et al., 2011) and (Balazia and Sojka, 2016). Finally, we state that the motion-analysis is closely related to several domains of computer science namely artificial intelligence, computer vision, image processing and pattern recognition.
The rest of the paper is organized as follows: In section 2, we give a groundwork of knowledge with respect to biometric gait model and provide some supervised/unsupervised machine learning techniques. In section 3, we propose a visual approach involving cameras that capture differing angles of gait from a distance. We also give mathematical data representation in terms of normalized feature vectors of our gait model. In section 4, we provide three learning models as classifier tools for resolving the problem. We further demonstrate and analyze the results of the experimental tests aiming the differentiation between the learning algorithms and the comparison between their performances. In section 5, we provide our concluding remarks, useful observations and proposal for future works.

Gait model and Data Processing Algorithms
We propose and validate the gait biometric as a pattern recognition system by applying machine learning algorithms in order to recognize individuals based on dynamics and shapes as a training set using a captured gallery sequence. Consequently, the main intention of this section is twofold: (a) Introducing the biometric gait as a recognition system that can be utilized to verify or identify human beings by their walking pattern and (b) Giving a brief overview (and surveying) of different supervised and unsupervised machine learning techniques for which we can apply to achieve gait recognitions.

Biometric Gait Models
Gait is defined to be a complete walking cycle that is obtained from a sequence of images. A gait cycle represents the time duration of heel-strike between the identical legs (Ngo et al., 2014). A gait recognition system involves three steps: User tracking and detection, gait feature extraction and training testing and classification. As mentioned above, there are two approaches to analyze gait, see for example (Kale et al., 2004) and (Wang et al., 2010). The first approach is to model gait as the human body structure or motion using knowledge of the body component and shape. Thus, the gait features are extracted using joint positions rather than dynamics from movements. This approach has the capacity to regulate gait feature free of the inspiration of model limitations in particular clothing. However, this model-based approach needs high computational complexity and high quality of taken gait sequences. The second approach is to model gait as the whole motion pattern of human body without considering the underline structure. Hence, the features are extracted using static gait characteristics such as centroid, height and width of the outline of a moving object. This outline is referred as a silhouette (Sarkar et al., 2005). In this model-free gait recognition approach (Dupuis et al., 2013), we directly get features from the pixel level in silhouettes obtained from image sequences. It should be noted that this approach has less computational complexity and comparatively easy to follow and apply. Furthermore, it is less sensitive to the quality of silhouettes.
It is worth mentioning that feature extraction is essential in gait recognition systems. In addition, the reduction of the dimensionality of features is a pivotal in saving invaluable running time and making classification more efficient. Consequently, gait sequences are captured from arbitrary walking directions. Then, silhouettes are obtained by using background subtraction and shadow removal in each gait sequence. Subsequently, the gait images are computed. We therefore, as a supplementary phase, set up a system to reduce the storage and computational costs by converting the 2-D outlines of training and testing data into simple 1-D trajectories. At the end of the process, a supervised (or semi-supervised) machine learning technique is employed for training, validating and testing purpose.

Machine Learning Algorithms
For the similarity measurements, machine learning algorithms, which most probably involving two parts: Model building and classification, are used. In addition, machine learning models are categorized into supervised and unsupervised models. In the supervised models of learning, a specific target value should be available while in the unsupervised models, we do not focus on target value. Furthermore, in supervised learning, we have some input variables and an output variable and the procedure compute the (mapping) function from the input to the output. The most commonly used machine learning algorithm is the Artificial Neural Network (ANN) (Sayed and Baker, 2015). It is an iterative process made up of simple dispensation units called neurons. Frank Rosenblatt primary presented the idea of a single-perceptron in 1958 (Rosenblatt, 1958). A Multi-Layer Perceptron (MLP) neural network is a perceptron-type network, which distinguishes itself from the single-layer network by having single or additional intermediate layers. Backward propagation of errors (or simply backpropagation), which has been used since the 1980s to alter the weights, is a widespread process of training the ANNs. The backpropagation is usually used in conjunction with an optimization method such as gradient descent. In some few cases, the process of training (or learning) could lead to over/under training phenomena and hence it is a timeconsuming (Liu, 2010).
There are many other techniques similar to artificial neural networks including Convolutional Neural Network (CNN), simple and oblique decision trees and Support Vector Machines (SVM) methods (Carrizosa and Morales, 2013). All of these techniques are founded on a similar principle that consists of choosing a structure (for example: MLP for neural networks, leaves representing class labels and branches representing conjunction of features for decision tree and core function for the SVM method). Some of these methods consist of putting the best parameters that permit to minimize erroneous classifications on the given learning set (for instance, using error backpropagation method or optimization methods). The first application of CNN to gait recognition are in (Castro et al., 2016) and (Sokolova and Konushin, 2017). The SVM, which is first introduced by Vapnik (1999) in 1992 as a geometric-based classifier, is widely used in biometric systems because of its capability and accuracy. The k-Nearest Neighbors (k-NN) machine learning model is one of the furthermost widespread and non-parametric algorithm used in classification and regression (Guo and Wang, 2003). Its purpose is to find k-nearest neighbors by using some predefined metrics and assign it to a certain class that is predominant in all successfully found subjects. Skewed class distribution is a drawback of k-NN that renders it and occasionally makes it less acceptable.
Afterward we use either the Root Mean Square Error (RMSE) or Mean Absolute Percentage Error (MAPE) to calculate the error rate, to compare the selected machine learning methods and to select the best method of classification that suits the type of data provided. Using software tools, in particular, MATLAB or Python Theano library provides many ready-made algorithmic packages for capturing, processing, testing and analyzing the gait dataset.

Description of Dataset
We give an overview of the type of data used in motion analysis and our gait recognition system that is based on the model-free approach. An n-dimensional feature vector is exploited in representing the characteristics of each gait higher-level image after image processing and carrying out some measurements. Here, we should emphasize that the process of feature extraction is important for improving the effectiveness of the classification process. Then, we apply machine learning techniques to understand the correspondence between these feature vectors for classification purpose.
In order to acquire feature vectors of a moving object, a silhouette is detected after performing the background subtraction of the captured image obtained by a fixed camera during the object detection and tracking step. Some assumptions should be taken into consideration such as the walking route has to be straight lines and only one moving object appears in the color video cameras. The background subtraction, see (Javed et al., 2002) and (MCBS, 2013), is applied to identify a moving object against a static background by estimating pixel properties of this static background. In fact, there are different background subtraction techniques, see for instance (Das and Saharia, 2014), such as frame difference, real time background subtraction and shadow detection and adaptive background mixture model for real time tracking.
We put forward silhouette analysis based recognition system to extract the moving object. For each frame, we convert the true colour image (RGB) into grayscale intensity image. Then, the super-bounding rectangular frame is located and the background is conjectured only for pixels inside this frame. Thus, we practise a threshold scheme to contract binary form (BW) of such image. So that we use pixel values within the frames to get the l × m binary matrix B as: where, 1 denotes the foreground and 0 denotes the background.
The central of mass point is calculated and, therefore, each located two dimensional super-bounding rectangular frame is converted into one dimensional vector of normalized n-components (measurements). In general, converting all 2-D outlines into 1-D vectors will lead to reduce the size of database. In addition, this will lead to decrease the computational costs of training and testing of different machine learning algorithms. It is worth noting here that the center of mass (centroid) of a walking person is permanently fixed. The center of mass point ( ) is the number of 1's in the frame. We further divide the result of each image sampling and quantization frame into n sectors by identified angles as shown in Fig. 1. For similarity measurements, we compute the normalized Euclidian distances between the center of mass and farthest foreground in each defined sector. Moreover, normalization (also called scaling), which is to write components of feature vectors within a specific prescribed range, e.g., [0,1] in our case, is significant matter to obtain consistent feature vectors. If (x k , y k ) is the pixel position of the farthest foreground in the sector k, say, of a frame, then each of the k component in the feature vector that represents such silhouette is computed as follow: Thus, we simply represent each continuous image (silhouette) data as an invariant n-vector. What follows is to approve more or less machine learning algorithms intended for classification and, therefore, gait recognition. Thus, we simply represent each continuous image (silhouette) data as an invariant nvector. What follows is to approve more or less machine learning algorithms intended for classification and, therefore, gait recognition.

Results and Performance Analysis
In order to extract walking characteristics of person for valuation, classification and forthcoming recognition, a complete gait cycle is analyzed and a sequence of frames are generated. Subsequently, as mentioned above, dataset is created as a list of descriptive -dimensional feature vectors. Moreover, this dataset can be stored in a central database for oversight situation wherever there is no previous information about the object. The design cycle of our overall gait recognition system is composed of a dataset collection and feature selection, as well as a learning algorithm and a powerful evaluation model, Fig. 2.

Dataset Source
In order to test the performance of the selected algorithms and assess different gait features, we prepared dissimilar datasets from different age groups as well as gender. Video clips of individuals are captured from different viewing angles and each clip is divided into 25 frame per second. We obtain indoor silhouettes from our identifiable video clips and the outdoor silhouettes from the three different sets of open gait CASIA database (1506). We divide overall gait dataset (16,821 n-dimensional feature vectors) for the participants into 3 disjoint sets; the first set for training, the second set for validation while the remaining set is taken for testing. In all of the experiments, a subset of 70% of the data is presented to the network during the training and the network is adjusted according to its error. A subset of 15% of the data is used for the network validation and the remaining 15% of the data for providing an independent measurement and testing of the network performance. We take into consideration that the data for the same participant does not exist in both of the test set and the validation/training sets.

Classification Methods
We give powerful evaluation methods to determine the effectiveness and efficiency of the utilized machine learning algorithms in our gait recognition problem by using different data samples. For analyzing the classification methods and for the purpose of comparison, we have adopted the three commonly used classification methods: ANN, SVM and k-NN. For each implemented model framework, we provide a brief description without repeating any derivations, as the theories of these techniques are well known. We use the ANN classifier to determine which class, out of several feature vectors, that an input feature vector belongs. For a given set of ndimensional feature vectors, the SVM classifier can find the hyperplane with the highest margin that categorizes these vectors. Note that SVM is a powerful tool for solving classification problems in specific as a two-class classifier (Carrizosa and Morales, 2013). The k-NN classifier uses directly closest training samples from the feature vectors to classify a new test example. Therefore, predictions are completed for an example by searching in the whole set of training for the k most comparable neighbours and summarizing the output for those k neighbours.

Performance Evaluations
As an example of a nonlinear computing, we implement Artificial Neural Network as a single hidden layer ANN (input layer, hidden layer, output layer). The process ends up when the network reaches minimum errors which are calculated by the Root Mean Square Errors (RMSE), Equation 6, between the network output values, z i and the desired output values i z . The values of the z i and i z must fall within the range from 0.0 to 1.0 and n is the number of sectors.
In this study, we mean by the accuracy an experimental measurement as percentage of the performance indication (from different prospective) that measures the classification correctness of all outputs of the algorithms. The first run of the ANN algorithm using 50 hidden neurons produced the result in Table 1. The RMSE measured the root of the average squared differences between outputs and targets while regression values R calculated the correlation between outputs and targets. The R-value of one means that there is a close relationship between the calculated output and the target while the zero value indicates a random relationship. Table 1 show that R-value is more than 0.85 indicating that the training algorithm has a good prediction outcome. Figure 3 shows the histogram of error values, also, predicting that the training procedure has only few outliers and generally of a good calculation outcome. Figure 4 and 5 give the performance of the algorithm. Moreover, it is noticed from Fig. 4 that root mean square errors stopped decreasing after 85 iterations (epochs). It is worth noting that training numerous times might cause different results owing to different samples and initial conditions. The testing dataset has no effect on the training; they only provide performance measurement during and after the training phase.    The SVM classifier is trained using set of samples of the form (F i , y i ), where each F i is an n-dimensional feature vector and y i is either -1 or 1 according to the class to which each vector belongs to. Therefore, the SVM algorithm attempts to determine a separating hyperplane that divides the set of training samples leaving the maximum margin from both classes. Although the problem is within finite dimensions, the training samples are not linearly separable and comprise more compound classification methods in that finite space. There are different proposal by researcher (Huang et al., 2018) of SVM kernel functions such as linear, quadratic, Radial Basis Function (RBF), sigmoid kernels, etc. In linear type, the kernel function is simply replaced by the inner product operation. We first set the SVM penalty and kernel parameters. A set of testing samples are then used to control the correctness of the subsequent classification system. Because of the high dimensionality of feature space, we adopt only the linear kernel, which is sufficient to achieve the classification with a good accuracy.
In order to classify a sample F i , the k-NN classifier algorithm first searches for its k closest training feature vectors and, hence, determine a class membership. Therefore, we practice k value to achieve the classification by figuring the modest histogram similarities. A proper value is picked out by applying a validation process to estimate the optimum k. Then, a sample from the feature space supposed to match the best set between the test and all the training datasets. The number of training dataset and the value of usually influence the classification accuracy. It is noticed that the k-NN classifier achieves better performance with suitable k value and lager quantity of training dataset. Figures 6 and 7 summarize the accuracy of SVM and k-NN classifiers. We demonstration a comparison between the performances of the algorithms for different training dataset sizes and k values.
In this context, we hypothesize that machine learning modelling and classifiers play an essential role in this such development. Based on the experimental results, we can say that ANN would lead to the best results with an average accuracy exceed 90%. It can exploit the large number of features with a relatively insignificant training set. On the other hand, the amount of features and training set affect the accuracy of the SVM and k-NN. The SVM and k-NN algorithms resulted in an average of 85% accuracy in differentiating subjects who were below the entire set of participants. The results of proposed algorithms are compared with the results of some state of the art gait recognition techniques that have been reported in (Youn et al., 2016) as shown in Table 2. However, all of these results indicate that the acts of gait recognition systems are still unsatisfactory and need lot of potentials to replace existing biometric systems such as facial recognition. The expected false alarm rates render the gait recognition system useless. In addition, the databases are likely very large comparing with other biometric systems.

Conclusion
We integrated human motion analysis into biometric recognition using gait and selected machine learning algorithms to handle its features. Human motion analysis are necessary in various areas of computer science such as biometrics, computer graphics and games industry. We can also use human motion in the security monitoring to detect potentially harmful behaviors and suspicious. Moreover, human motion has a number of advantages in gender classification. The study of human motion is not a recent theme and its physical models have been effectively practiced in medical gait analysis. Gait biometric as a pattern recognition system could be an advantageous method in the use over the traditional biometric systems as it is considered unobtrusive and can be measured in a way that does not require a person to alter his or her typical behavior. In addition, gait biometric does not require a person to present any more information than is already available to a casual observer; and studies have suggested it is very difficult to imitate. However, there are covariates as passage of time, due to footwear, terrain, fatigue and injury that might influence the precision of gait recognition. Viewpoints (view angle of camera) might also affect gait recognition performance.
Our most recent aim is to use the convolutional neural networks, which are at the heart of deep learning's in computer vision, in the gait identifications problem. Also, the combination of more than one biometrics (multimodal biometrics) such as gait, face and foot pressure could be one of our next intentions.