Combining SMOTE and OVA with Deep Learning and Ensemble Classifiers for Multiclass Imbalanced

: The classification of real-world problems always consists of imbalanced and multiclass datasets. A dataset having unbalanced and multiple classes will have an impact on the pattern of the classification model and the classification accuracy, which will be decreased. Hence, oversampling method keeps the class of dataset balanced and avoids the overfitting problem. The purposes of the study were to handle multiclass imbalanced datasets and to improve the effectiveness of the classification model. This study proposed a hybrid method by combining the Synthetic Minority Oversampling Technique (SMOTE) and One-Versus-All (OVA) with deep learning and ensemble classifiers; stacking and random forest algorithms for multiclass imbalanced data handling. Datasets consisting of different numbers of classes and imbalances are gained from the UCI Machine Learning Repository. The research outputs illustrated that the presented method acquired the best accuracy value at 98.51% when the deep learning classifier was used to evaluate model classification performance in the new-thyroid dataset. The proposed method using the stacking algorithm received a higher accuracy rate than other methods in the car, pageblocks, and Ecoli datasets. In addition, the outputs gained the highest performance of classification at 98.47% in the dermatology dataset where the random forest is used as a classifier.


Introduction
Machine learning is a subfield of Artificial Intelligence (AI) and predictive analytics. It is developed to make computers learn by creating algorithms that can gain input data and apply statistics to forecast the output meanwhile evolving and adapting it as new data is available. Most practical machine learning techniques need supervised learning. The supervised learning method iterative predicts training data based on the class label. Classification is a supervised learning technique using machine learning algorithms and labeled training data to obtain how to determine class labels to sample from the domain. The problem of class imbalance is the main problem in machine learning such as classification problems with imbalanced datasets. Class imbalance is seen in real-world problems, including spam filtering, disease screening, fraud detection, and advertising click-throughs.
The classification in imbalanced datasets that the number of instances in the training dataset for each class label is not balanced. The unbalanced data as a training set where the majority class is greater than the minority class, for example, the medical prognosis work of detecting diseases in which the majority group of the patients is healthy or negative and detecting the diseases as unhealthy or positive cases in a minority group (Johnson and Khoshgoftaar, 2019;Hasib et al., 2020). The data level approaches, the algorithm level techniques, and the hybrid methods are applied to manage class imbalance in machine learning (Leevy et al., 2018). Firstly, the data level approaches are a method of preprocessing data before classifying using traditional classifiers. The main handle of multiclass imbalanced datasets of the data level techniques is to reduce the impact of skewed class distribution and to rebalance label class. Data-level approaches are suitable for the solutions to the problems of class imbalance. The algorithm-level methods can not only illustrate particular algorithms but also adapt the procedure of the classifier learning itself. Finally, the hybrid-level techniques are combined learning algorithms. An example of a hybrid level is the boosting technique. (Mahani and Ali, 2019;Upadhyay et al., 2021) In classification problems, the approaches for handling imbalanced problems are to increase the samples of a dataset. One of them is the most commonly known as Synthetic Minority Oversampling Technique (SMOTE) (Yu and Zhou, 2021). SMOTE is a very popular oversampling or up sampling technique. It is used in data preprocessing methods for handling imbalanced datasets. This method is an upsampling method where the synthetic examples are produced for the minority class. SMOTE is applied to raise the sample of the minority class to equal the class distribution (Maheshwari et al., 2018;Hasib et al., 2020). Furthermore, it helps avoid the overfitting problem, and also it is applied to improve the effectiveness of a classification model in imbalanced datasets (Sun et al., 2015). Hence, the up sampling approach can be applied to solve the imbalance problem in the machine learning method (Charte et al., 2013).
In the machine learning fields, the multiclass imbalanced problem is one of the most challenging. There are plenty of classification problems that occur and involve multiclass type and imbalanced data. Also, real-world problems often have multiple classes: Bioinformatics, image, handwriting recognition, face recognition, speech recognition, and text. The classification problems are supervised learning methods in which each example is correlated with a class target or label. It is trained on training data that has been labeled. Multiclass problem is the classification tasks with more than two classes in the machine learning method. There are very popular two techniques, including One-Versus-All (OVA) and One-Versus-One (OVO). The approaches of multiclass classification are to change the multiple class problems into several binary problems using a heuristic method (Bolon-Canedo et al., 2011;Özdemir et al., 2021). In this study, the OVA technique is applied to split the classifying k class problems into k binary classification problems, in which each problem discriminates a given class from the other k−1 classes (Mehra and Gupta, 2013). The step of learning the classifiers divides the single class as a positive and the rest as negatives. Therefore, the present work combined methods SMOTE and OVA and has the as main objective to handle multiclass imbalanced datasets, and improve the classification model.
In summary, this study presented the techniques of handling imbalanced datasets for multiclass classification, including SMOTE approach and OVA strategy based on deep learning and ensemble classifiers; stacking, and random forest algorithms.
These methods are applied to improve the model of prediction and to increase accuracy. Moreover, they are taken to handle multiclass imbalanced datasets.

Background and Related work
This section reviews the oversampling approach and OVA strategy for multiclass imbalanced classification problems. There are several novel oversampling methods for imbalanced datasets. An imbalanced classification problem is the issue of classification when the training dataset has an unbalanced distribution of classes. A realworld imbalanced classification problem may have more two class labels or multiclass classification.

SMOTE Approach
Many researchers have proposed using oversampling methods for handling multiple class and unbalanced datasets. The sampling approach is a preprocessing of training sets that are used to create balanced a dataset and adjust the prior distribution for minority and majority classes. There are two approaches including under-sampling and oversampling methods. SMOTE strategy generates synthetic instances to balance class distribution by random replicating in-stances of minority classes (Abd Elrahman and Abraham, 2013; Maheshwari et al., 2018;Hasib et al., 2020;Almayyan, 2021;Yu and Zhou, 2021). Majumder et al. (2020) proposed a method of handling multiclass imbalanced problems using geometry-based information sampling and class prioritized synthetic data creation (GICaPS). The combination of oversampling and undersampling approaches is applied to improve the class division and increase the diversity of samples within each class. The oversampling technique is preprocessing class unbalanced training set where it adds more numbers of examples in the minority class by the randomized method. Furthermore, it tends to balance the number of instances for the majority and minority classes in the training set. This method is a very effective technique for the classification of imbalanced datasets.
The up-sampling approaches for handling multiclass imbalanced datasets were presented by Sáez et al. (2016) The purpose was to analyze subsets of specific instances in different classes. In this study, the training sets 21 multiclass datasets taken from the UCI repository. The results showed the highest average accuracy (72.56%) in all datasets when C4.5 is used as a classifier. Özdemir et al. (2021) proposed a method of imbalanced hyperspectral images for the classification using SMOTE that relied on deep learning algorithms. In this experimentation, the IEEE Dataport was used as a training dataset. The results obtained the highest accuracy rate (96.49%, 95.64%, 93.38%) in the multiclass hyperspectral image dataset when smote balanced with 5-fold crossvalidation was applied. Waqar et al. (2021) proposed the method of SMOTE technique depended on a deep learning algorithm for the prediction of a heart attack. SMOTE is used to handle imbalanced datasets. This study used datasets from UCI. The experimental results illustrated that SMOTE based on an artificial neural network algorithm when tuned properly outperformed all other models. Also, this method obtained good performance when it is used to classify heart failure. The results gained the best average accuracy (96%) when using a SMOTE-based artificial neural network.
In addition, Yuan et al. (2018) developed the ensemble method of the deep learning algorithm. It is used to handle imbalanced problems. The stratified under-sampling is used to balance all classes. The results illustrate the best performance of the method for imbalanced multiclass classification problems. Furthermore, the highest accuracy improvement is at 24.7% and the proposed approaches decreased the computational cost.
In summary, the oversampling approach is preferred over the undersampling techniques because the undersampling approach tends to remove instances from data that may carry some important information.

One-Versus-All (OVA) Strategy for Multiclass Classification
OVA is also called One-Against-All (OAA), a popular heuristic approach. It separates the multiclass classification problem into several binary classification ones. The binary classifier is trained on each binary classification problem and then the prediction is created using the model, with the highest confidence score (Abd Elrahman and Abraham, 2013). The classifier predicts instance x as class label y, providing a maximum probability score; the calculation of the final decision function was measured using Eq. (1) (Ghanem et al., 2010): Where assigns the test instance to the class with the highest output value.
Other researchers have proposed a combination of SMOTE and OVA strategies for multiclass imbalanced datasets. Puttiporn and Yaowares proposed an approach of adaptive synthetic sampling technique (ADASYN) and SMOTE approach for oversampling in the dataset. The onevs-one and one-vs-all with the Gentleboost algorithm are applied to handle imbalanced data with multiple classes. The assessment of the elderly's knee osteoarthritis incidence dataset was obtained from Ban Han Sub-District Health Promoting Hospital, Thasala district, Nakhon Si Thammarat province. The experimental results indicated that the ADASYN method and OVO strategy received the highest accuracy rate of 97.31% (Puttiporn and Yaowares, 2019).
The SMOTE approach for imbalanced big data using Random Forest was presented by Bhagat and Patil (2015). In this study, the proposed method for classification of multiclass imbalanced data. This approach composes of two stages: In the first stage, OVA and OVO strategies are used for splitting training datasets into subsets of binary classes. In the second stage, the SMOTE technique is used to balance the training set. The Random Forest (RF) algorithm is used to classify the predictive model. The SMOTE method is adjusted to huge data using MapReduce and to handle a huge dataset. The different datasets were gained from the UCI repository for the experiment. Furthermore, this method is implemented on Apache Hadoop and Apache Spark platforms. The experimental results acquired illustrated that the proposed method had better performance than other methods.
The differences between the proposed method and the previous work (Puttiporn and Yaowares, 2019) are as the following. Firstly, in preprocessing step, the present work combined SMOTE and OVA strategies based on deep learning and ensemble methods for multiclass imbalanced datasets. In this step, three different learning algorithms (deep learning, stacking, and random forest) were used to evaluate subsets of each class, whereas previous work [xx] proposed the method of SMOTE based on OVO and OVA based on the Gentleboost algorithm. Lastly, the classification and the assessment used deep learning and ensemble methods via 10-fold cross-validation. These methods can help improve good generalization effectiveness of classification.

Performance Measures for Multiclass Imbalanced Classification
In this study, the accuracy, kappa, sensitivity, and specificity are used to assess the model of classification. The performance of multiclass imbalanced classification is typically evaluated with a confusion matrix as illustrated in Table 1.

Accuracy
The accuracy value is used to measure the classification evaluation on the datasets and to compare the accuracy of the classification models. It is the percentage of correctly classifies instances out of all instances. The efficiency of the classifier is assessed in terms of the following measures: True Positive (TP) is the number of positive samples that are precisely classified. True Negative (TN) is the number of negative samples that are precisely classified. False Negative (FN) is the number of positive samples that are incorrectly classified as negative. False Positive (FP) is the number of negative samples that are incorrectly classified as positive (Farid et al., 2014):

Sensitivity
Sensitivity is also called the recall or the true positive rate. It is the proportion of true positives, correctly classified as positive class labels. It is applied to improve model classification for multiclass problems. Sensitivity based on the micro average method is applied to measure the effectiveness of the classification where the training set varies in size or imbalanced problem (Panthong and Srivihok, 2019). The sensitivity value can be calculated as:

Specificity
Specificity is the ratio of true negatives, correctly classified as negative class labels. Specificity based on the micro average method is applied to measure the effectiveness of the classification model where the dataset varies in size or imbalanced problem (Panthong and Srivihok, 2019). The specificity can be computed as shown below:

Cohen's Kappa
Cohen's Kappa or Kappa is like classification accuracy, except that it is depended on the confusion matrix. The Kappa is a more useful measure to use on problems having an imbalance in the classes. When working with an imbalanced dataset, this value is more informative than overall accuracy. Cohen's kappa coefficient is a measurement method that can handle both multiclass and imbalanced class problems. Cohen's kappa is calculated with the following formula ( Where, p0 is the overall accuracy of the predictive model and pe is a measurement of the consent between the model predictions and the actual class values.

Materials and Methods
The SMOTE approach and OVA strategy are used to handle multiclass imbalanced datasets. SMOTE method is used to create instances for the minority class. OVA with different classifiers were applied to improve the accuracy of model classification. Combining SMOTE approach and OVA strategy with different classifiers used in the present study and associated abbreviations. Moreover, these methods have a mechanism to avoid overfitting, since typically cross-validation measures of predictive accuracy are used. The framework for SMOTE approach and OVA strategy with deep learning and ensemble classifiers; stacking and random forest algorithms for multiclass imbalanced dataset handling is illustrated in Fig. 1.

The Process of Combining Smote Approach and Ova Strategy with Deep Learning and Ensemble Classifiers for Multiclass Imbalanced Data Handling
Initial 9 training datasets from UCI datasets. The method preprocessing approach contains three steps: Step1: The up-sampling approach with SMOTE is applied to balance a training dataset, that its synthesis instances for the minority class. In this study, the oversampling approach uses the SMOTE up sampling operator. In this step, the training set is filtered to only consider instances of the minority class Step 2: The dataset is modified for the OVA strategy.
The OVA technique is applied to divide each class of training sets where one class is set as positive, the rest classes are assigned negative Step 3: Select the final model with three different classifiers. In this study work, deep learning and ensemble classifiers (stacking, and random forest algorithms) are used to evaluate the classification model The deep learning algorithm is a subcategory of machine learning methods. It is supervised training for regression and classification tasks. The deep learning method is applied to multiclass classification problems. In addition, it can help evaluate the performance of classification in supervised learning (Candel and LeDell, 2022). In this study, the learning algorithm with H2O is based on a multilayer feedforward Artificial Neural Network (ANN) for predictive modeling that is trained using Backpropagation (BP). The ANN model or Multilayer Perceptron (MLP) is the most widely applied deep neural network. Deep learning model to identify different classes of the training dataset.
A stacking learning algorithm or stacked generalization is a machine learning ensemble where the models are combined using another machine learning algorithm. The main concept is to train learning algorithms with a dataset and then produce a new training set with these models. Then, this new training set is used as input data for the combiner of the learning algorithm of predictive models. stacked generalization is trained on a separate dataset to deduce the biases of the learning set and avoid overfitting. The base classifier often consists of different learning algorithms. Therefore, stacking ensembles are often heterogeneous (Xie et al., 2022).
A random forest algorithm is an ensemble of a certain number of random trees. This approach is an extension of the bagging technique based on the predictions of the decision trees. It is a very popular decision tree ensemble. The random forest technique is a type of supervised training. It is used widely in classification problems. This method has proven its effectiveness on a wide range of different predictive modeling problems. The random forest algorithm makes a decision tree prediction more efficient and increases the accuracy and the robustness of the classification model (Bisht et al., 2016;Putri et al., 2021).

Classification and Evaluation
In this study, the three classifiers (deep learning, stacking, and random forest) were used for the classification and evaluation models. Furthermore, in this step 10-fold crossvalidation is applied. Cross-validation estimates the statistical performance of a learning algorithm by dividing data into separate training and test sets. Moreover, it increases the accuracy rate of datasets. This study uses RapidMiner studio version nine enterprise educational edition for model training and testing. It is used as a tool for preprocessing methods, training classifiers, and performance evaluation. In this study, the proposed method is compared to common SMOTE and without SMOTE. Three learning algorithms; deep learning, stacking, and the random forest are used for the classification and evaluation models. All datasets are tested through 10-fold cross-validation.

Datasets
The datasets for this experiment using 9 multiclass imbalanced training sets were received from the UCI Machine Learning Repository and all datasets were downloaded on the website (http://archive.ics.uci.edu/ml). The data are described in Table 2. Furthermore, the method of the SMOTE approach and OVA strategy with three different classifiers used in the current research and correlated abbreviations showed in Table 3.
Here is a dataset example and an access link as illustrated in Fig. 2.

Results and Discussion
The experimental results of combining SMOTE approach and OVA strategy with three different classifiers for multiclass imbalanced dataset handling were presented.
The three different classifiers, namely deep learning, stacking, and random forest algorithm with 10-fold crossvalidation were used for evaluating the performance of the prediction. The proposed method was compared with nonpreprocessing and preprocessing with SMOTE. Furthermore, the accuracy, sensitivity, and specificity of the training sets of the 9 benchmark datasets were used to evaluate the performance classification.
Tables 4, 5, and 6 illustrated the comparison of the classification accuracies using SMOTE (Up sampling) and OVA based on deep learning, stacking, and random forest algorithms, respectively.
From Table 4, the results demonstrated that the proposed method received a higher accuracy rate than other methods when SMOTE and OVA strategy with deep learning as the classifier is applied in almost all datasets except the dermatology dataset. In addition, the SOVA_DL approach obtained the best performance of classification (98.51%) in the new-thyroid dataset.
From Table 5, it was seen that the highest accuracy using SMOTE and OVA with stacking algorithm almost all datasets except dermatology and vertabral3c datasets. The SOVA_SK technique gained the best performance of classification (98.42%) in the pageblocks dataset. In addition, the research outputs indicated that the presented method provided better accuracy values than the single SMOTE approach for the feature with a high imbalance ratio. On the contrary, the datasets, having a low imbalance ratio, outputs were inferior to other methods.
From the results from Table 6, it was seen that the SOVA_RF method is superior to other methods. The SOVA_RF approach achieved the best performance of classification (98.47%) in the dermatology dataset. In addition, the experiment results indicate that the proposed method offers better classification accuracy values than the single SMOTE approach for the training set with a low imbalance ratio. In contrast, with datasets having a high imbalance ratio, outputs were inferior to other methods. Table 7, it showed the effectiveness comparison for each approach in terms of Cohen's Kappa. Table 7, when considering the efficiency measured regarding Kappa, showed that the SOVA_RF approach had the highest Kappa (0.981), which was higher than those states of the art models in the dermatology dataset. The SOVA_DL method had a high Kappa (0.974) with the new-thyroid dataset. Furthermore, the SOVA_SK technique had a high Kappa (0.971) in the pageblocks dataset. In summary, a high Kappa coefficient indicated a high agreement of consistent classification, whereas a low value indicated a low agreement of consistent classification. Table 8 illustrated the performance comparison of model classification in terms of sensitivity and the effectiveness comparison of specificity value was shown in Table 9. Table 8 indicated that the SOVA_DL method gained sensitivity values more than other methods in 3 datasets; yeast, new-thyroid, and vertabral3c. The outputs from Table 8, indicated that the maximum sensitivity value was 0.9851 using SMOTE and OVA with a deep learning algorithm in the new-thyroid dataset. The SOVA_SK approach achieved sensitivity values better than SOVA_DL and SOVA_RF in the car, pageblocks, and Ecoli datasets. Furthermore, in glass, dermatology, and Cleveland obtained good sensitivity when the SOVA_RF is applied to measure the performance of the classification model. Table 9 it was seen that the SOVA_RF method received the best specificity values (0.9969). The SOVA_DL technique obtained better specificity than other methods in 3 datasets; yeast, new-thyroid, and vertabral3c. Moreover, the SOVA_SK approach gained the best performance in pageblocks at 0.9960.
The graph compares the accuracy values of the classification model using SOVA_DL, SOVA_SK, and SOVA_RF methods for multiclass imbalanced data handling.  Abbreviation Combining SMOTE approach and OVA strategy based SOVA_DL on deep learning algorithm with 10-fold cross-validation Combining SMOTE approach and OVA strategy based on SOVA_SK stacking learning algorithm with 10-fold cross-validation Combining SMOTE approach and OVA strategy based on SOVA_RF random forest learning algorithm with 10-fold cross-validation        From the bar chart in Fig. 3, it was seen that the results of the SOVA_DL approach were superior to other methods in yeast (69.93%), new-thyroid (98.51%), and vertabral3c (88.25%) datasets whereas in three of the datasets; glass, dermatology, and Ecoli, the accuracy rate was lower than the SOVA_SK and SOVA_RF techniques.
Overall, it was clear that each method of machine learning improved the performance of model classification for multiclass imbalanced data handling.
The distribution of instances for each class label for the original dataset and after the implementation of SMOTE (up sampling) technique is demonstrated in Fig. 4.

Discussion
The overall results of the proposed method could improve the performance of the classification model for multiclass imbalanced datasets. In this study, the performance was compared with the three classifiers; deep learning, stacking method, and random forest technique.
The results indicated that the SOVA_DL method had better performance than both single SMOTE and without SMOTE methods in all datasets except the dermatology dataset. The outputs of three datasets; yeast, new-thyroid, and vertabral3c offered the best accuracy, sensitivity, and specificity when a deep learning algorithm was used as a classifier. The SOVA_DL had the best performance with the new-thyroid dataset. The classification performances of the SOVA_SK approach were better than those of the other methods in car, pageblocks, and Ecoli. In addition, the best result for glass, dermatology, and Cleveland datasets was provided using the SOVA_RF technique, and it is likewise the sensitivity and specificity values of performance classification. The better performance could have been caused by combining the predictions from the multiple classifier's models on the same dataset.
The research also considered the procedural similarity depended on SMOTE and OVA strategy to handle the multiclass imbalanced problem. The difference between the presented method and previous work (Puttiporn and Yaowares, 2019) was the use OVA technique with deep learning and ensemble classifiers; stacking, and random forest algorithms. In this study, OVA was used to split the multiclass training set into multiple binary training sets. The three classifiers using a deep learning algorithm with an H2O operator, stacking algorithm, and random forest algorithm were used to train each binary classification model. In addition, these algorithms were to assess the efficiency of the model classification with 10-fold crossvalidation, whereas the previous study (Puttiporn and Yaowares, 2019) used SMOTE and ADASYN approaches for the imbalanced data sets of elderly's knee osteoarthritis. The experimental outputs illustrated that using the random forest technique with a gain ratio as the splitting criterion to create individual decision trees which could improve the performance of imbalanced classification.
In conclusion, these results indicated that both the SOVA_DL and SOVA_RF approaches might be a good choice for a dataset that had a small number of classes and a low imbalance ratio whereas the SOVA_SK method could handle the multiclass imbalanced dataset. The prediction performance of the SOVA_SK approach was suitable for the domain that had a large number of instances and a high imbalance ratio. Therefore, the proposed method was used to handle imbalanced datasets and to avoid overfitting problems.

Conclusion
The method of handling multiclass imbalanced datasets using SMOTE based on OVA strategy with deep learning and ensemble classifiers (stacking and random forest algorithms) was presented in this study. In the present study, the performances of SOVA_DL, SOVA_SK, and SOVA_RF approaches were compared with that of a single SMOTE and without up sampling technique. The aims of using SOVA_DL, SOVA_SK, and SOVA_RF methods are to handle multiclass imbalanced datasets and to improve the predictive model. The best effectiveness of classification with an accuracy of 98.51% (sensitivity = 0.9851 and specificity = 0.9925) in the new-thyroid dataset when the SOVA_DL approach was used. The SOVA_SK had the highest accuracy rate was 98.42% (sensitivity = 0.9842 and specificity = 0.9960) in pageblocks dataset. Furthermore, the best accuracy of SOVA_RF method was 98.47% (sensitivity = 0.9847 and specificity = 0.9969) in dermatology dataset. The research results indicated that a hybrid of SMOTE and OVA techniques received a better accuracy rate than the single SMOTE approach because, in preprocessing step, the OVA with three different classifiers is applied to evaluate each binary model and to decide the best final output. In addition, the experiment results indicated that the SOVA_SK method can help increase accuracy for high imbalance datasets with a large number of instances, whereas both the SOVA_DL and the SOVA_RF approaches provided good performance for the domain with low imbalance datasets and a small number of instances.
The benefits of the proposed method for this research are the improvement in the efficiency of classification by increasing the accuracy rate. Furthermore, this method achieved a good alternative for handling multiclass imbalanced datasets. In the future, the sampling method might be applied to realworld problems using a hybrid under-sampling or adaptive sampling other methods.