Fall Detection Using the Histogram of Oriented Gradients and Decision-Based Fusion

: As the number of fall incidents among elderly people and patients are continuously growing, researches boosted their researches to propose efficient automatic fall detection systems. In particular, they formulated the fall detection problem as a supervised learning task where some visual features are extracted from the video frames and used to automatically identify the position of a human as “Fall” or “Non-Fall” based on a model learned using labeled training frames. Despite the promising reported results, existing fall detection systems exhibit noticeable room for improvement. Learner fusion which builds multiple models and aggregates their respective decisions is an alternative that would improve the fall detection performance. In this paper, an image-based fall detection system that captures the visual property and the spatial position of the human body using the Histogram of Oriented Gradient from the video frames is proposed. Then, the extracted features are used to train three classification models. Namely, the Naïve Bayes, the K-Nearest Neighbors and the Support Vector Machine algorithms are adopted. Next, the majority vote is used to aggregate the decisions of the individual learners. The proposed system was assessed using a standard dataset and yielded promising results. Standard performance measures along with the statistical significance t-test were used to prove that the fall detection system based on majority vote fusion outperforms the individual classifier based approaches.


Introduction
Yearly, millions of elderly people (over 65) experience falls.Specifically, more than one out of every four elderly people falls at least once a year (Stevens et al., 2012).Statistics have shown that falling represents one of the major causes of death due to injury especially for elderly people above 79 years (Mubashir et al., 2013).The incident becomes more serious if the falling person is alone.Fear of falls can increase the risk of falling and this has negative influences on people's lives because they may avoid physical activities or experience depression as a consequence (Igual et al., 2013).Brownsell and Hawley (2004), the research showed that after deploying a fall detection system, people gained more confidence and independence, felt safer.Therefore, various technologies have been adapted to design, implement and deploy systems for detecting people falling in an unsupervised context.Wearable sensor-based fall detectors have been developed.However, they are constrained by the need to wear or hold the sensors, which is inconvenient for some elderly people, especially when they exhibit memory loss symptoms.On the other hand, fall detection based smartphone sensors are not considered reliable because of the battery's limited autonomy.The recent progress in image processing and computer vision fields has supported studies aiming to overcome the automatic fall detection challenge.Especially, solutions based on image processing techniques are non-intrusive, as no wearable equipment is required.In addition, compared with solutions based on acoustic and floor vibration sensors, these approaches have proven to be more robust to noise.Most of the developed image-based solutions rely on machine learning techniques.Especially, classifiers like Naïve Bayes and Support Vector Machine (SVM) (Hearst et al., 1998) have been widely used to overcome the fall detection challenge.
In this paper, typical image features are used to encode the frame video content and a fusion of classification models is adapted to automatically detect fall incidents.Namely, the proposed system assigns the Histogram Of Gradients (HOG) (Dalal and Triggs, 2005) feature vectors extracted from the video frames to the "Fall" or "Non-Fall" category based on multiple classification algorithms decisions.Namely, it relies on the majority vote aggregation of decisions of the Naïve Bayes, SVM (Hearst et al., 1998) and K-Nearest Neighbors (KNN) (Guo et al., 2003) classifiers.
The rest of this paper is organized as follows: Section 2 presents the works related to fall detection systems.Section 3 describes the design of our fall detection system.Section 4 presents the results achieved by the proposed fall detection system.Finally, this research is concluded and the future works are presented in section 5.

Related Works
Fall detection aims at reducing the extent of the injuries caused when a falling incident occurs.Many studies have focused on developing fall detection systems (Sprute et al., 2015;Rahman et al., 2016;Cola et al., 2017;Yang et al., 2016).A typical fall detection system consists of a wearable-based system that requires wearing a device for detecting the falls.However, in this research, a focus was made on the image-based fall detection systems.Specifically, the related works which couple image processing and machine learning techniques to address the fall detection problem were covered.A novel approach for fall detection based on machine learning and visual feature extraction was proposed in (Ismail and Bchir, 2017).First, a membership-based histogram descriptor was used to represent the visual properties of the video frames.Then, KNN was deployed to classify the scene as a fall or non-fall.The researchers in (Vishwakarma et al., 2007) proposed a fall detection system in which three primary features were extracted from the video frames.Namely, they extracted the aspect ratio, horizontal and vertical gradient values of the moving human and the fall angle.The fall is detected by keeping track of the aspect ratio.A fall occurs when the aspect ratio changes drastically, the vertical gradient value is less than the horizontal gradient value and the angle between the vertical line of the human and the horizontal axis of the bounding box is less than 45 degrees.The authors in (Ozcan and Velipasalar, 2016) introduced a fall detection system based on a camera and wearable device.The system computes the HOG for the images captured from the camera.The decisions obtained based the sensor data and the camera were compared to increase the detection accuracy.Liu et al. (2010), the researchers proposed an approach for detecting falls by classifying the body postures using a KNN algorithm.The ratio and differences of the bounding box width and height were fed into KNN as frame features.To distinguish between the falling posture state and lying down posture state, the researchers used the transition time from the experimental data and statistical hypothesis tests.The authors in (Gunale and Mukherji, 2015) proposed a system for patient monitoring based on KNN classification and the ratio of the fitted ellipse, orientation angle, silhouette threshold and motion coefficient as visual descriptors.The researchers in (de Miguel et al., 2017) proposed an elderly fall detection system which mainly subtracts the object (human body) from the frame background using standard background subtraction technique.Note that the ratio and angle of the bounding box contouring the human body and the ratio derivative are used as visual descriptors.Finally, KNN is used to detect "Fall" and "Non-Fall" scenes.Gu et al. (2016), a fall detection system based on SVM classifier was proposed.HOG feature was extracted from each frame and the global dynamic appearances were measured.In addition, the local dynamic shape was extracted from each depth frame.The researchers used SVM to assign the extracted feature vectors to the "Fall" or "Non-Fall" classes.A multiple sensor-based fall detection system was also introduced in (Liu et al., 2012).It relies on two Doppler radar sensors to obtain the relative speed of motion and three classifiers (SVM, KNN and naïve Bayes) were used for prediction.Ma et al. (2014), the researchers presented a fall detection system based on the adaptive Gaussian mixture model (Reynolds, 1992), which is applied to subtract the human body from the background.Then, a Canny Edge Detector (Canny, 1986) is used to detect the human body silhouette.The features used to represent the visual content are the Curvature Scale Space (CSS) and Extreme Learning Machine (ELM).Akagündüz et al. (2017), the researchers proposed a fall detection system that adopts the Naïve Bayes classifier to map the visual features into the predefined categories.Besides, they applied a background subtraction on the depth video, to extract the silhouette orientation volume as low-level feature.

Proposed System
The proposed image-based fall detection system that aims to recognize "Fall" and "Non-Fall" frames in an automatic manner.Typically, the proposed system includes two critical components.The first one is the visual feature extraction, while the second component consists in the machine learning technique used to recognize the fall incident.In particular, HOG feature (Dalal and Triggs, 2005) is associated with a classifier fusion approach to improve the overall fall detection performance.In fact, the HOG features proved to be highly efficient when used for object detection and tracking (Jung et al., 2011;Xu and Gao, 2010;Ma et al., 2011) and it would yield promising discriminative ability between the two pre-defined classes ("Fall" and "Non-Fall").On the other hand, the rationale for using the learner fusion approach is the exploitation of the individual classifiers diversity for a more accurate decision.
As illustrated in Fig. 1, the HOG (Dalal and Triggs, 2005) feature is extracted from each video frame.The resulting features vectors and the corresponding labels are then fed into the individual supervised learning algorithms (SVM (Hearst et al., 1998), KNN (Sprute et al., 2015) and Naïve Bayes) to learn the classification models.The obtained models are used in the testing phase to predict the class value ("Fall" or "Non-Fall") of the testing frames.The majority vote fusion is used to reach the aggregate the decisions of the individual classifiers.
The main idea behind the fusion of learners is to obtain a more accurate decision-based on the individual classifiers outputs.One should note that the independence and diversity of the individual classifiers is a requirement for the fusion approach to improve the classification results (James, 1998).Let the number of classifiers be N.The fusion decision obtained using the majority voting algorithm corresponds to the class value assigned by at least ⌊N/2⌋+1 individual learner (James, 1998).(James, 1998): Assume that model 2 and model 3 assign the class value 1 to the test instance V, while the model 1 yields the class value 0. Thus, the fusion decision is: When computing the majority of two class labels outputted by individual classifiers.In this case, there are two applicable forms of majority vote algorithm-simple majority and unanimity.A simple majority vote occurs when more than 50% of the classifiers agree on the same decision.Meanwhile, unanimity occurs all the classifiers agree on the same decision.
The wrong prediction probability of the majority vote algorithm can be calculated using: where, Pmaj is the majority vote wrong predication probability, ε is the error rate of each of the individual classifiers and L is the number of classifiers.Note that the majority vote algorithm guarantees higher accuracy when the accuracies of the individual classifiers exceed the random guess performance (above 0.5) (James, 1998).

Experiments
As our project is based on image classification, video frames were extracted from the set of videos in (Erdogan et al., 2010).These videos contain falls and other normal physical activities scenes, such as sitting down, walking and standing up.The video sequences were recorded in four different locations, namely a coffee room, home, classroom and office.The videos are exposed to different variables, such as illuminations, shadows and reflections.
All videos were recorder at 25 frames per second.The resulting dataset of images contains 1097 fall frames and 3545 normal physical activities frames.Figure 3 shows some samples of the obtained 320240 frames.A 10-fold cross-validation was used to train the individual fall detection models.An Intel i7-2.5 GHz CPU, 16 GB RAM laptop and MATLAB software were used to implement these experiments.Table 1 shows the resulting performance of the different KNN settings.Three distance metrics were adopted for KNN classification.Namely, the Euclidean, Cityblock and Chebytchev distances between two ddimensional instances X and Y were defined as follows: The overall accuracy of all KNN settings vary from 0.85 to 0.97%.The Euclidean and Cityblock distance metrics yielded higher accuracy than Chebychev distance metric.Note that for our application the sensitivity is more important because the system should not misclassify fall incident.Based on the obtained results, the Cityblock distance metric yielded the best performance with k = 3.This finding is confirmed by the ROC curves dispalyed in Fig. 4. As can be seen, the curves corresponding to the Cityblock and the Euclidean distance metrics are bove the other curves while the chebychev distance is noticeably below.Similarly, multiple SVM models based on different kernel functions were trained for the purpose of performance comparison.Namely, the different kernel functions used in this work are the linear, the Radial Basis Function (RBF) and the polynomial kernels.
The obtained performance measures are summarized in Table 2.As one can see, the polynomial SVM model yielded the highest accuracy with 97% while the linear and RBF based models attained a slightly lower accuracy.Moreover, the polynomial kernel based SVM outperforms the linear and RBF models in term of sensitivity and specificity.These results are confirmed by the ROC curves and the Area Under Curve values (Fawcett, 2006) shown in Fig. 5.
Also, several Naïve Bayes (NB) classification models were built using different settings.Specifically, two data distributions were used to model the data distribution and calculate the NB probabilities; The first one is the Normal distribution (Gaussian).The other data distribution is the kernel based density.Namely, the data was modeled using the Normal, Box (Uniform), Triangular and Epanechnikov kernels.The Hyperparameter Optimization conducted for theses kernels gave the results in Table 3.The highest accuracy (0.84) was achieved by two NB models; The first one uses the normal kernel with a kernel width of 0.0910.The other model relies on a Gaussian distribution.The lowest accuracy was attained by the NB model based on the box kernel.Further analysis showed that the highest sensitivity (0.74) was obtained using the NB model based on the Gaussian distribution.
Similarly, the best F-measure performance (0.78) was attained by the NB model based on the Gaussian distribution.Figure 6 shows the ROC curves and the corresponding AUC-ROC (Fawcett, 2006) values achieved by all NB models.One can obviously notice that the NB model based on the Gaussian distribution outperforms the other kernel based NB models.This confirms the results reported in Table 3 and proves that the NB model based on the Gaussian distribution overtakes all the other models.
Table 4 shows the performance of the three individual classifiers as well as the results obtained using the majority vote decision.One can clearly see that the fusion decision-based on the majority approach overtakes the individual classifiers.The majority vote based decision yielded an accuracy of 0.98 and exceeded noticeably the individual classifiers performance.Similarly, the attained sensitivity, AUC-ROC (Fawcett, 2006) and precision are 0.96, 0.97 and 0.98 respectively.(Fawcett, 2006) Reject Ho Reject Ho Reject Ho A statistical t-test was conducted to prove the statistical significance of the results obtained using the proposed approach.The null hypothesis was rejected when the obtained p-value was less than or equal to 0.5.As reported in Table 5, a t-test was conducted using different performance measures.The t-test results proved that the fusion results are significantly different with respect to all performance measures.Thus, one can claim that the performance of the proposed fusion approach is significantly better than the individual classifiers.

Conclusion
In this paper, an image-based fall detection system was proposed.It encodes the visual property of the human body in the video frames using the HOG feature and uses the resulting feature vectors to learn an accurate classification model.The majority vote was used to aggregate the decisions of typical individual classifiers.Namely, the Naïve Bayes, KNN and SVM algorithms were used as single learners.The proposed fall detection system was implemented, validated and assessed using a standard dataset and the appropriate performance measures.The experimental results proved that the fusion approach based on the majority vote decision outperforms the typical individual classifiers.In particular, the statistical t-test confirmed that the results obtained using the majority vote fusion are significantly better than those obtained using the individual learners.

Fig. 1 :
Fig. 1: Illustrative flowchart of the proposed fall detection system

Table 1 :
Performance measures obtained using various KNN settings

Table 2 :
Performance measures obtained using various SVM settings

Table 3 :
Performance measures obtained using various Naive Bayesian classification models

Table 4 :
The performance measures achieved using the majority vote based fusion and the individual classifiers

Table 5 :
Summary of the obtained t-test results