Human Age Predication from Face Images Based on Combining Deep Wavelet Network and Machine Learning Algorithms

: Due to the numerous variances in face appearance, age estimation using facial images is a difficult subject. Many factors can affect the estimation of human age such as race, face post, gender, lifestyle, etc. By considering more factors, the optimum performance may be obtained. In this study, we proposed a method to predict the age of facial images. The proposed method consists of four main stages: (1) Preprocessing. (2) Face alignment and cropping. (3) Feature extraction by using Deep Wavelet Network (DWN). (4) Age prediction. Five of the machine learning classifiers (K-nearest neighbor, support vector machine, Naïve Bayes, decision tree, and random forest) were suggested in this proposal to combine with DWN and then select the best performance one. Two DWNs trained for male and female faces separately, so we have to classify faces before inputting to one of the two networks (classifying faces' gender is out of this study's scope). The performance of predicting the age was measured first when the age was divided into eleven age groups, where the accuracy was 97% for the females and 98% for the males. Also, we secondly measured the performance when the age was divided into seventeen age groups (five years for each group) with an accuracy of 91% for female and 92% for male faces


Introduction
Facial analysis has received a lot of interest recently in the field of computer vision.Numerous characteristics of the human face may be used to identify someone and to convey their ethnicity, emotions, gender, age, and other characteristics, (Benkaddour, 2021).
The age classification is one of these features that can be particularly useful in a variety of real-world applications, such as security, electronic vending machines, electronic customer relationship management, human-computer interaction, forensic art, cosmetology, entertainment, as well as item recommendation, biometrics, video surveillance, social understanding, crowd behavior analysis, and identity verification (Abdolrashidi et al., 2020;Agbo-Ajala and Viriri, 2020).
Age faces from images can be categorized into age groups or age ranges.Age categorization is made easier by certain apparent changes in the face as people become older.It is extremely difficult to determine the physical ages of people from face photographs, both for computers and humans because physical ages are sometimes quite different from perceived ages, (Bao and Chung, 2020).
As we age, the skin thickens, texture and color alter, the facial skeleton wrinkles and lines start to show and the tissue composition changes to be more subcutaneous.Aging is an extremely difficult process that differs widely depending on the individual.Due to the numerous variances in facial appearance, automatic age estimation using facial images is a difficult subject, (Zhang et al., 2019).
Due to the unique characteristics of each individual, the age estimation task encounters several challenges.Exposure, weather, gender, and way of life are all relevant factors.It can be difficult to discern the aging characteristics of each age group, even while certain tendencies are comparable for faces from the same age group.The human eye is capable of identifying familiar faces, while the human mind is capable of estimating a person's age.Even though being human, it is never totally accurate.When using computers, a person's face may reveal a lot about their age, (Mualla et al., 2018).
The face images in these categories include certain variances (such as variation in lighting, stance, size, occlusion, look, and noise), which may hinder the capacity of the manually created systems to precisely categorize the age of the photos, (Abdolrashidi et al., 2020).
This academic topic has advanced significantly as a result of its importance in intelligent practical applications.However, a few concerns with age prediction remain unresolved.Numerous approaches have been proposed to address this issue in past years.Several of these hand-crafted algorithms perform unsatisfactorily when predicting the ages of unrestricted in-the-wild photographs.They cannot handle the numerous degrees of variation present in these challenging unconstrained imaging circumstances, (Agbo-Ajala and Viriri, 2020).
Recently, approaches based on deep learning have performed well in this area.Convolutional neural networks, one of the newer methods that have been suggested in recent years, improved performance and helped solve the majority of issues, but it still has some drawbacks, including slow training times, reliance on a large number of photos, and poor performance accuracy when dealing with many classes.
The novelty of this research is proposing a new method that has not been used before which produced high-accuracy results.In addition to that, we are predicting the age in new age groups that we proposed (seventeen groups), which were not included in other research.
A novel approach proposed to enhance age groups classification, which consists of two steps: Feature extraction using Deep Wavelet Networks (DWN) and classification based on the machine learning algorithms (K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Naïve Bayes (NB), Decision Tree (DT) and Random Forest (RF)).

Paper Objectives
The main goal of this study is to predict the human age group from the facial image under various conditions and challenges.

Paper Contributions
The main contributions to this study are:  Using a new deep learning method based on a wavelet to extract the features from the images, these extracted features will become input to the machine learning algorithm for age prediction  We suggested a new method for the alignment of the face by solving the problem of degrading the digital image due to rotation  High accuracy prediction for kids and old humans and high accuracy with a prediction of ±3 years for all ages  -2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, 60+) and the FG-NET dataset with four age groups (0-13, 14-21, 22-39, 40-69).The suggested method has a classification accuracy of 92.62 and 94.59%, respectively.Abdolrashidi et al. (2020) presented a deep-learning architecture to categorize the age group of face photos using an ensemble of attentional and residual convolutional networks.Their model can focus on the key features and informational areas of the face according to the attention process, which might lead to a more precise age prediction (such as wrinkles around the eyes and mouth).The attentional network prediction and the residual network prediction were integrated and the final prediction was based on the ensemble model to further increase the prediction accuracy.The UTK-face dataset was used with the eight age groups (0-10, 10-20, 20-30, 30-40, 40-50, 50-60, 60-70, and 70+).The highest classification accuracy they achieved was 91.3%.
Agbo-Ajala and Viriri (2020) suggested a different CNN technique that uses a two-level CNN model that comprises the extraction of the features and the classification to generate a robust age group of unfiltered real-world faces.They used a powerful image preprocessing method to prepare and analyze the unfiltered real-world faces before feeding them into the CNN model to handle the significant variability in those faces.The authors used different datasets one of them was the Adience dataset with the eight age groups (0-2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, 60+).A 93.8% accuracy rate was obtained.Abirami et al. (2020), this study focused on the automated age group prediction of people using live facial photos.The area of the face that was found using the haar cascade pre-trained face detection model was input to the Caffenet CNN model to estimate the age.Eight values for eight predefined age categories make up the age group estimation CNN's output layer (0-2, 4-6, 8-13, 15-20, 25-32, 38-43, 48-53, 60-80).They employed the UTK-face dataset and their best classification accuracy was 63.07%.Zhang et al. (2019) suggested a method for precise age estimates in the wild based on attention Long Short-Term Memory (LSTM) networks.To extract local features from age-sensitive areas, this technique builds AL-RoR or AL-ResNets networks using the residual network of residual networks (RoR) or Residual Networks (ResNets) models and LSTM units.On the IMDB-WIKI and ImageNet datasets, the networks are trained.Finally, they tested the suggested methodology on the Adience dataset using both networks and the results were 97.33 and 97.36%, respectively.
In general, the limitations and research gaps for these works are: 1. Age detection models may be biased towards specific ethnicities, leading to inaccurate age predictions for individuals from certain racial or ethnic groups 2. While some age detection models claim to have high accuracy rates, the reality is that they are still prone to errors, especially when it comes to predicting the age of individuals who are older or younger than the age range they were trained on 3. Most of these algorithms focus on detecting age within a specific range, such as 10-60 years old.However, these algorithms may not be accurate for individuals outside of this age range.Also, they detect ages with errors of ±8 years

Deep Wavelet Transform
Wavelet scattering is a null-parameter convolution network originally proposed by Mallat (2012).A wavelet scattering network enables the derivation of low-variance features from image data for deep learning applications.Other than that, low-pass scaling filters and predefined wavelets are used in the network.The scattering transform generates data representations that reduce differences within a class while keeping discriminability across classes.The scattering can be utilized successfully in situations with a shortage of training data.
The input signal is first averaged using wavelet low pass filters; this is the layer zero scattering feature and the high-frequency details are lost with the averaging operation.Subsequently, the details lost in the first step are captured at the next layer by performing a continuous wavelet transform of the signal to yield a set of scalogram coefficients.In this case, a modulus is applied to the scalogram coefficients, and the output is filtered with the wavelet low-pass filter, producing a set of layers 1 scattering coefficients.The same process is repeated to obtain the layer 2 scattering coefficients.Correspondingly, the output of the scalogram coefficients in the previous layer always becomes the input to the operations in the next layer.This nonlinear process continues according to the number of layers defined by the user.

K-Nearest Neighbor (KNN) Classification Algorithm
Is a simple machine-learning algorithm, it was developed by Cover and Hart in 1968 and is based on the supervised learning approach.This method maintains the dataset throughout the training phase and, when receiving a new image, categorizes it into a category by presuming that the new image and the existing images in that category are similar, (Kataria and Singh, 2013).

Support Vector Machine (SVM) Classification Algorithm
It is a supervised learning technique, typically employed for classifying issues in machine learning.The objective of the SVM method is to find the best decision line or boundary that can categorize the following data points in n-dimensional space and efficiently assign them to the correct class.This ideal decision boundary is referred to as a "hyperplane".SVM chooses the extreme vectors and points to form the hyperplane.The technique is known as a "support vector machine" because it uses support vectors to represent these extreme occurrences, (Zhang, 2012).

Naïve Bayes (NB) Classification Algorithm
These straightforward probabilistic classifiers work by applying the Bayes theorem to a set of characteristics under the strong (naïve) independence requirements.According to NB classifiers, there is no connection between any of the characteristics or variables.Membership probabilities are estimated for each class, giving the likelihood that a certain class will be represented by a particular amount of data points.The most likely class is thought to have the greatest chance.The class labels are selected from a finite set in the models that give class labels to problem instances, which are represented as vectors of feature values.The fact that NB can estimate the classificationrelated parameters from a very small set of training data is a benefit, Saritas and Yasar (2019).

Decision Tree (DT) Classification Algorithm
The DT is a supervised learning method that may be used for regression and classification issues.This classifier has a tree structure, with leaf nodes indicating the classification outcome and interior nodes indicating dataset properties and the decision rules which are the branches.Based on the provided dataset's features, decisions or tests are made.A decision tree only poses a question and divides the tree further into subtrees based on the response (yes, no), (Patel and Prajapati, 2018).

Random Forest (RF) Classification Algorithm
It is a supervised machine learning technique used for classification and regression issues in machine learning.To increase the dataset's prediction accuracy, the RF classifier uses many decision trees on different subsets of the input data.It is based on the idea of ensemble learning, which is the practice of merging many classifiers to solve a challenging issue and enhance the model's performance, (Alam and Vuong, 2013).

Proposed System Materials and Methods
The proposed method consists of four main stages: Preprocessing, face alignment and cropping, feature extraction, and finally, age group classification, as illustrated in Fig. 1.All the steps are summarized in Algorithm 1.

Preprocessing
The first step in this proposal is preprocessing, in this step, the image is resized to 28 × 28, this is because the best performance of the deep wavelet is achieved with this size.Also, the face images were converted into gray images.Finally in this step, we increase the number of images for each age group to 500 images for males and 500 for females.This is achieved by shifting the width and height with a range of 0.1, horizontal flip, shear with a range of 0.1, and zoom with a range of 0.1:

Face Alignment and Cropping
One of the major problems for face recognition is the face pose, which may be inclined to the left or right by a specific angle.To align the face, we proposed a method that depends on detecting the eyes and finding the inclination angle.The public and classical rotation matrix (relation 1) are not suitable for a digital image, using this matrix will produce an aliasing problem.So we suggested using another method for the rotation of the digital images based on the relations (3 or 4) which are the decomposition of matrix 1: cos sin sin cos The first step in this method is to detect the eyes by Haar-Cascades and find the center of each eye.The second step is to find the length of the distance between the eyes centered on the x-axis (dx) and the y-axis (dy).The third step is to find the angle, where the angle is calculated by Eq. 2. To find the rotation direction we check if the center value of the left eye is less than the center value of the right eye according to the y-axis, then the image will be rotated counterclockwise using relation (3) and vice versa with relation (4).Finally, the image is rotated by the determined angle and then the face is cropped from it: where: θ = The rotation angle dx and dy = The distances where: x* and y* = The new pixel coordinates x and y = The old pixel coordinates All the steps of the suggested alignment method are shown in Fig. 2.

Features Extraction by DWN
The facial images are input into the proposed DWN for training.Note, that we proposed two DWNs one trained on male faces and the other trained on female faces, the two networks have the same architecture.The pseudocode for DWN is shown in Algorithm 2.
The best performance is achieved when separating male faces from females due to the high difference in features between them.The suggested architecture of the DWN consists of nine levels of wavelet decomposition, a flattened layer, and two Fully Connected layers (FC), where the first consists of 512 channels, the second FC consisted of 11 channels and softmax is an activation function, which is why we called it a deep wavelet network.The number of epochs used was 200.After training the networks on the data, the features are saved from the second layer FC which is the input to that layer.The parameters used in this proposal are listed in Table 1.

Age Group Classification by Machine Learning Algorithm
The saved features from the previous stage will be the inputs (the features of the training data) to the classification algorithm.Note, the training data of the classification algorithm is the extracted features from the DWN and the output is the predicted age group of the testing data.Five different machine learning algorithms are used to classify the age groups (KNN, SVM, NB, DT, and RF). Figure 3 illustrates the structure of the proposed models for feature extraction and age group classification.

Results and Discussion
The Dataset Used The UTK-face dataset was used in this study.It has a broad age range (from 0-116-years-old) and a big face dataset of over 20,000 face photos with annotations for ethnicity, gender, and age.Large changes in position, facial expression, resolution, lighting, occlusion, etc., are covered in the images.
We used only 11000 images (5500 male faces and 5500 female faces) for 11 age groups, each group with 500 images.We used 70% of the images for the training network and 20% for validation in the feature extraction stage.The remaining 10% was utilized for testing in the classification stage.Table 2 illustrates samples of face images from the dataset.In this proposal, we separate the dataset into two main groups, one for male faces and the other for female faces.Detection of the face gender is out of the scope of this proposal.
The difficulties or problem that we faced in using this dataset is the fewer images number in some age groups, such as (5-9), (70-79), and (80-116).Therefore, to solve this we increased the number of images to 500 in all age groups, by applying different image editing methods as they were mentioned in the preprocessing stage.

Face Alignment and Cropping
Aligning the face depends on using the Haar-Cascade method for detecting the eyes.The results are presented in Fig. 4. The difficulties of aligning the face arise when the face is wearing sunglasses because it hides the eyes, unidentifiable of the center.Other than that, the cropping of the face will be done to include all the face pixels.

Features Extraction for Age Groups
The separation of the females from the males in the networks has improved and increased the classification accuracy because females' aging sometimes differs from males' aging.With aging, the skin tends to become less thick, elastic, drier, and wrinkled, so often females put make up on their faces that cover some lines or sagging in the face, or undergo plastic surgery so the age appears younger than the real age Fig. 5.For males, sometimes the effects of aging are more prominent on them.
In this proposal, two different age groups from the UTK-face dataset are used that are 11 and 17 age groups.We tested the proposed method with the two age groups, and the results for each group are as follows.The 11 Age Groups The 11 age groups that we proposed are (0-4, 5-9, 10-15, 16-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80-116) where the rang of age in each group was not equal, it depends on the similarity of facial features in that group.As mentioned before, two networks are proposed to extract the features one for male faces and the other for females.5500 images were used for each male and female network, the overall number of images was 11000.We have noticed that for male, the network was successful recognize the right age, this is due to the fact males most of the time shows an age close to the real age, unlike women, whose face often shows an age that appears younger than it is.
The suggested network trained on the dataset images, the training accuracy, and the loss function is shown in Figs.6-7 for the female's network and in Figs.8-9 for the male network.
To predict the age, we used different machine learning algorithms (KNN, SVM, NB, DT, RF).The performance of combining these machine learning with DWN is measured, the highest classification accuracy obtained was 98 and 97% when using the DT and RF algorithm for the prediction of the males and females age sequentially, Table 3 shows the results.We conclude from Table 3 that the combination of DWN + RF gives the best result, so this method will be adopted.The performance of the proposed method (DWN + RF) is tested for each group of the 11 age groups, the results are shown in the confusion matrix in Fig. 10 for females and Fig. 11 for males.Meanwhile, the performance is summarized in Tables 4-5.
The accuracy of the female age prediction is drop with age increases until be stable at the age of 60 Table 4, this is due to the makeup used by females and the high changing of the female body at the youth ages.Unlike the prediction of males' age which is more accurate as shown in Table 5.

B. The 17 Age Groups
For the 17 age groups (0-4, 5-9, 10-14, 15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80+) the difference of age ranges in each group was 5 years.Note that the overall images used for 17 age groups are 17000 (8500 for females and 8500 for males).The training accuracy for the suggested network and the loss function was measured and the results are shown in Figs.12-13 for the female's network and in Figs.14-15 for the male network.The same machine learning algorithms were used for the classification process (KNN, SVM, NB, DT, RF).The performance of combining these machine learning with DWN is measured and the highest classification accuracy was obtained by using the RF to classify the males age with 92 and 91% for classifying the males and females aged sequentially, Table 6 shows the accuracy rating of the two networks with the different machine learning methods.We conclude from Table 6 that the combination of DWN + RF gives the best result, so this method will be adopted.

Conclusion
In this study, we proposed a new method to classify the age into 11 groups from 0 to 116 and 17 groups, by defining four stages.The contribution of this proposal is to use two deep wavelet networks with the same architecture combined with a machine learning classifier.Face alignment is also another contribution to this study.The separation of females from males has improved and increased the classification accuracy because the aging of females sometimes differs from the aging of males.Often females put makeup on their faces that cover some lines or sagging in the face, so the age appears younger than their real age.The important contribution of this study is the ability to predict the age within a very small error (almost less than 5 years) as in the method of 17 age groups.The limitation of the currently proposed method is facial occlusions, such as glasses, masks, and facial hair can affect the accuracy of age detection algorithms.These occlusions may cover up fine lines and wrinkles, making it difficult for algorithms to accurately estimate age.The accuracy of age detection algorithms is heavily dependent on the diversity of the dataset used to train them.However, many datasets are limited in terms of ethnic and gender diversity, making it challenging to accurately estimate the age of people from different ethnicities or genders.
Future research should focus on developing real-time age detection algorithms that can process images in realtime and provide accurate age estimates.Also, Future research should explore the relationship between different facial features and aging and develop algorithms that can capture these relationships accurately.

Algorithm 1 :
Age group classification Input: Face image Output: Age group classification Step 1: Start Step 2: Read the colored image.Step 3: Image pre-processing, resizing image into (28 × 28), converting into a grayscale image, and image augmentation.Step 4: Align the face and crop it from the image.Step 5: Input the face image into the proposed DWN model to extract the features.Step 6: Input the facial features into a machine-learning method to classify the age group.Step 7: End

Fig. 1 :
Fig. 1: Flowchart of the proposed system for age group classification

Fig. 3 :
Fig. 3: The structure of the proposed models for features extraction and age groups classification

Fig. 10 :Fig. 11 :
Fig. 10: Confusion matrix of the 11 age groups classification method for female faces

Fig. 12 :Fig. 13 :Fig. 14 :
Fig. 12: The loss function for the 17 age groups for the female network

Fig. 15 :
Fig. 15: The 17 age groups' training accuracy for males for males.Meanwhile, the testing measurements are presented in Tables 7-8.

Fig. 16 :
Fig. 16: Confusion matrix of the 17 age groups classification method for female faces

Table 1 :
The parameters used in the proposed DWN for feature 3. Apply wavelet transformation to each image in the training set to obtain wavelet coefficients.4. Normalize wavelet coefficients and age labels.5. Set learning rate alpha and number of iterations (num_iters) 6.For i in range (num_iters): 7. Shuffle wavelet coefficients and age labels 8.For j in range (number of batches) 9. Select a batch of wavelet coefficients W_batch and age labels Y_batch 10.Compute the output of the neural network given W_batch as input 11.Compute the mean absolute error between the predicted output and Y_batch 12. Compute the gradients of the mean absolute error for the neural network parameters 13.Update the neural network parameters using stochastic gradient descent with learning rate alpha 14.Remove the last layer(s) of the neural network and use one of the machine learning classifiers.

Table 3 :
The accuracy of classification of the face images into

Table 4 :
The performance of the 11 age groups classification method for female faces

Table 5 :
The results of the 11 age groups classification method for male faces

Table 6 :
The results of the age groups classification models for the 17 age groups

Table 7 :
The results of the 17 age groups classification method for female faces

Table 8 :
The results of the 17 age groups classification method for male faces

Table 9 :
Comparison of the age groups' prediction accuracy for face images from three datasets