Hybrid Ant Colony Optimization and Genetic Algorithm for Rule Induction

: In this study, a hybrid rule-based classifier namely, ant colony optimization/genetic algorithm ACO/GA is introduced to improve the classification accuracy of Ant-Miner classifier by using GA. The Ant-Miner classifier is efficient, useful and commonly used for solving rule-based classification problems in data mining. Ant-Miner, which is an ACO variant, suffers from local optimization problem which affects its performance. In our proposed hybrid ACO/GA algorithm, the ACO is responsible for generating classification rules and the GA improves the classification rules iteratively using the principles of multi-neighborhood structure (i.e., mutation and crossover) procedures to overcome the local optima problem. The performance of the proposed classifier was tested against other existing hybrid ant-mining classification algorithms namely, ACO/SA and ACO/PSO2 using classification accuracy, the number of discovered rules and model complexity. For the experiment, the 10-fold cross-validation procedure was used on 12 benchmark datasets from the University California Irwine machine learning repository. Experimental results show that the proposed hybridization was able to produce impressive results in all evaluation criteria.


Introduction
Ant Colony Optimization (ACO) is a metaheuristic framework that handles combinatorial optimization and other problems (Jabbar et al., 2020;López-Ibáñez et al., 2016;Sagban et al., 2016). The inspiring source of ACO is derived from actual behavior of ants and artificial pheromone trails used as indirect communication medium. Heuristic information and pheromone trails are two types of numerical information employed by ACO. The heuristic information is taken from the problem being solved. Pheromone trails represent a distributed, numerical information that the ACO adapt during its execution to reflect the search experience (Dorigo and Stützle, 2004). Many applications involve the usage of ACO metaheuristic framework such as scheduling (Blum, 2005), travel salesman problem (Sagban et al., 2017), assembly line balancing (Blum, 2008), sequential ordering (Dorigo and Stützle, 2010), DNA sequencing , packet-switched routing (Di Caro and Dorigo, 1998), feature selection (Kanan et al., 2007), data clustering (Jabbar and Ku-Mahamud, 2018;Jabbar et al., 2019a;2019b) and data classification (Al-Behadili et al., 2019;2018b;2018a).
Ant-Miner classifier is a prominent ACO variant for data mining classification task. It generates a set of rules from the data to classify them into predetermined classes (Al-Behadili et al., 2020a). These rules are used to predict the unknown classes for unseen data. The Ant-Miner generates rules in the form of if-then rules that are considered simple and comprehensible knowledge representor. Examples of Ant-Miner applications include medical diagnosis, bankruptcy determination and stock price prediction (Al-Behadili, 2018;Ripon, 2019).
The hybrid classifier in the Ant-mining literature introduced by (Saian and Ku-Mahamud, 2012) proposed Ant-Miner coupled with Simulated Annealing (SA) to generate a list of classification rules. In Ant-Miner, each ant discovers a rule. The study proposed SA as a local search procedure to improve this rule iteratively. The SA works for each rule on the basis of the temperature variable, which starts at a high value and then decreases on the basis of predefined factors. The search runs a certain number of iterations and selects the best rule from the available neighbourhood depending on its quality. Using the SA mechanism, which starts with high temperatures, allows the rule with low quality to be selected. Then, temperature will be decreased and the difference between the current and previous qualities will be crucial for selecting the rule with high quality. The performance matrix is indicated on the basis of the rule quality, the number of discovered rules and the terms per rule. The performance of this approach was tested using 13 datasets from a UCI repository, showing that it is comparable with the original Ant-Miner in the predicative accuracy.
ACO/PSO2 is a hybrid swarm intelligence metaheuristic algorithm for rule-based classification that combined ACO with Particle Swarm Optimization (PSO) (Holden and Freitas, 2008). The pruning procedures of ACO/PSO2 are applied to discover the best rule from each iteration. ACO/PSO2 uses two pruning procedures. The first procedure is the original Ant-Miner pruning procedure and applied to the best rule discovered whose number of terms is less than a fixed value (i.e., 20). If the constructed rule entails more than 20 terms for each rule, then the pruning iterates to remove the unimportant or detrimental terms from the classification rule until the number is decreased to 20 terms. Subsequently, the Ant-Miner pruning procedure is implemented.
However, the Ant-Miner classifier suffers from premature exploitation because of the absence of any local search in its structure. Ant-Miner is not designed to explore the neighborhoods of the current rule and does not consume more time in improving it iteratively. The neighborhood structures are not fully covered. In this manner, this type of search is over-explorative because it is either a single neighborhood structure movement, as exemplified in (Saian and Ku-Mahamud, 2012), or it does not profit from local search (Holden and Freitas, 2008;Martens et al., 2011). Therefore, various neighborhood structures can be developed to catapult the search to another point, which makes it possible to completely utilize the neighborhood. The present study proposes the hybridization of Genetic Algorithm (GA) with Ant-Miner classifier algorithm for a more mature exploitation. This paper is constructed as follows. The next section illustrates the proposed hybrid ACO/GA classifier. The materials and methods section present experiments method, performance evaluation metrics, databases, classifiers and the parameter setting used in our experiments. Then, discusses the experimental results of the proposed classifier. Finally, provides the conclusions and suggestions for possible future research.

Hybrid ACO/GA algorithm
The overall goal of ACO/GA is to benefit from the characteristics to form the neighborhood structures. In GA, a strong inter-correlation exists between exploration components (e.g., mutation) and exploitation components (e.g., crossover). Furthermore, GA has succeeded in using the crossover and mutation operators in utilizing multineighborhood structures. Figure 1 shows how GA utilizes crossover and mutation to improve the search in two of its neighborhood structures in the proposed hybridization.
The ACO/GA classifier begins to construct one classification rule from the training dataset. This discovered rule is then inserted in the rule list, in which every case in the data that satisfies this rule is removed from the training dataset. These operations terminate when all the cases in the training database are lower than the pre-specified constant values known as UncoveredInstancesNO. Model complexity This approach has three major stages namely, rule building (RuleConstructs), pruning rule (RulePrune) and updating pheromone. The initial procedure is the RuleConstructs, where every ant begins to allow terms to inserted in the rule. The ant inserts a term at a time while raise the classification accuracy according to its probability value. The probability function that is use to add term to the current rule is expressed in Equation 1 as follows (Parpinelli et al., 2002): where, [ij(t)] is the amount of pheromone concentration for each term at iteration (t); [ij] is the problem depending upon heuristic function; a is the attribute number in the dataset; bi is the term number for each attribute; and xi is set to 1 (if the attribute is not yet visited by the ant) and 0 (otherwise). In addition, the heuristic function value is used together with the pheromone value to decide on the term selection. In ACO/GA, the heuristic function is inspired by information theory. The ACO/GA computes the amount of information contained in each term (entropy). The heuristic function can be expressed as follows (Parpinelli et al., 2002): This process is repeated to complete the construction rule based on two conditions. The first one, the current rule is not covered the prespecified the minimum instances by rule; while the second condition; certain attributes from the data are not yet used by the current constructed rule. Once the rule is finished, the classifier assigns the Then part of the rule by selecting the majority class vale among the all instances covered by the discovered rule. The discovered rule will then undergo pruning procedure RulePrune, which aims to avoid the overfitting problem by reducing the length of the constructed rules to increase simplicity. The procedure deletes one term from the rule at a time as the classification quality of the constructed rule improves. This method loop until only one term is left in the pruned rule or just no more improvement happens. The class value of the rule will have a high chance to be changed through this stage due to the fact that the pruned rule may cover other instances from dataset compared with the cases covered by the original one. The pheromone is updated after rule construction and prune procedures. Updating the pheromone has two basic stages. Firstly, the amount of pheromone amount is increased in all terms, including the discovered rule, according to rule quality function in Equations 4 and 5 as follows (Parpinelli et al., 2002): where, TP represents the cases number covered and classified by the constructed rule, FN represents the number of cases covered by constructed rule but it has a different class, TN represents the cases number that not covered by the constructed rule and has a class different from that classified by the constructed rule and FP represents the cases not covered by the constructed rule but has a class classified by the rule. Secondly, evaporating each term does not appear in the rule to avoid cumulative pheromone in un important terms. Then, another ant constructs its rule derived from pheromone history. The process is finished based on the following stopping conditions. The first condition is that the number of discovered rules must be equal to the number of ants. The second condition is according to the number of rule convergence that is statically determined, in which the ant starts to converge by building a rule similar to that previously constructed. The best quality rule from all discovered rules will be added to the discovered rule list.
To speed up the search in the proposed ACO/GA, GA is only applied for the iteration-best rule instead of applying it for all constructed rules to ensure that the algorithm stays lightweight. The change that may be applied to the classification rule is defined by a neighborhood structure. A neighborhood structure is a function N: S → 2S that assigns a set of neighborhood N(s) ⊆ S to every s ∈ S. N(s) is also called the neighborhood of s. The best rule in discovered-rule-list will be treated by crossover and mutation procedures. Therefore, the best discovered rule will be kernelled to form the neighborhood structure.
ACO/GA is a multi-neighborhood structure algorithm. In each iteration, the mutation generates a new starting rule where the crossover can be applied. ACO/GA selects two parent chromosomes (rules). The iteration-best rule represents the FirstParent. The SecondParent is a random classification rule selected from the classification list of discovered rules. Then, the mutation operator is used to maintain genetic diversity from one generation of a rule to the next. The mutation rate parameter is performed to decide if the rules should have a mutation. The parameter of the mutation rate is compared with a random number to perform the mutation operation. The mutation operator selects the random bits in the parent chromosomes and flips the value of this bit. Thus, the size of mutation is 2terms exchange. Figure 2 shows the pseudocode of the mutation operator.
The crossover operator is the process in which parent chromosomes (rules) exchange genetic information to create the best rule. The crossover rate parameter is used to perform crossover in a similar approach to the mutation operator. If the crossover rate is greater than the random number, then the crossover will be implemented. Different methods are used to trade genetic information between two individuals. The crossover operation used in this study guarantees a fully matured exploitation for such promising search regions. The length of crossover movement CrossOverLength in ACO/GA is set to half of the classification rule. Thus, the crossover terms are replaced up to two CrossOverLength from the FirstParent and SecondParent. Figure 3 shows the pseudocode of the crossover method.
The procedure then repeatedly performs operations from the given rule until no further improvement can be achieved. The acceptance criterion used to accept the better-quality classification rule during the crossover and mutation stages can be expressed in Equations 6 and 7: where, TP represents the cases number covered and classified by the constructed rule, FN represents the number of cases covered by constructed rule but it has a different class, TN represents the cases number that not covered by the constructed rule and has a class different from that classified by the constructed rule and FP represents the cases not covered by the constructed rule but has a class classified by the rule. The acceptance criteria in GA will be the rudder that guides the search towards the bottom of the shape of the neighbourhood structure. Furthermore, it determines the size of movement in the current search region.  The ACO/GA pseudocode shows the adaptation of the aforementioned components of GA-based algorithm (i.e., crossover, mutation and acceptance criteria) in the Ant-Miner framework (Fig. 4). The combined feature makes the proposed classifier substantially different from the previous Ant-Mining classification algorithm.

Experiments
A 10-fold cross-validation method is used in our experiments. In this method, the dataset is split into 10 subsets. Each subset is equally sized, where nine are used for the training process. The remaining subset is used in the testing stage. This process is repeated 10 times with a different subset for training and testing to ensure that all subsets are used for training and testing. Subsequently, the performance of all folds is averaged and the standard deviations are calculated. The 10-fold cross-validation method has been also adopted in other ant-mining classifier studies (Al-Behadili et al., 2020b;Parpinelli et al., 2002;Saian and Ku-Mahamud, 2012).

Performance Evaluation
The evaluation in this study is performed on the basis of three criteria. The first criterion is the classification accuracy in discovering the rule list, which is called the correct classification rate. This criterion is based on the correctly classified instances in the test data. Each time, the training subsets consist of n number of instances. The classifier constructs the training and test subsets that will be used to test the performance. The correct classification instances will determine the performance of the proposed classifier. The second criterion is the size in discovering the rule list, which is measured by the total number of discovered rules. The third criterion is the model size, which is measured by the amount of terms per rule. The number of terms (conditions) refers to the number of antecedents carried by each rule.
This study also computes for the classification accuracy rank and the model size rank of the statistical results, in which the nonparametric Friedman test is conducted with the Holm post-hoc test. In this manner, the performance of all classifiers in accordance with classification accuracy and simplicity can be observed. Then, the result of the nonparametric Friedman test with Holm's post hoc test is used to determine the average classification accuracy rank versus the average number of discovered rule rank and the average model size rank for all classifiers. This test aims to find the optimal classifier that balances different objectives. The test is conducted to rank the algorithms' performance for each dataset in descending order. A low rank implies good algorithm performance. The test is used to rank the best classifier that balances between classification accuracy and model size. The A classifier dominates B classifier if and only if the following two (2) conditions are true: The first condition, A is not worse than B with respect to both objectives, i.e., classification accuracy and the model size. The second condition, A is strictly better than B at least in one (1) objective. Thus, A classifier is becoming optimal only if and only if it is not dominated by any other classifiers (Dua and Karra, 2017).

Databases
Several experiments are performed using 12 UCI benchmark datasets in Weka's ARFF format to test the performance of the adaptive algorithm (Dua and Karra, 2017). These datasets are famous in the ant-mining algorithms literature and demonstrate different attribute numbers, which lie between 4 and 22. The attributes exhibit categorical and continuous styles. The datasets also differ in instance size number within the range of between 148-8124. The main descriptions of the experimental datasets are listed in Table 1.

Classifiers
The implementation of hybridizing Ant-Miner with GA or ACO/GA is evaluated with two other hybrid classifiers, namely, ACO/SA and ACO/PSO 2. These classifiers are considered the most related classifiers in ant-mining literature. The ACO/PSO 2 classification rule discovery software packages is publicly available on: https://sourceforge.net/projects/psoaco2/.
In other hand, the ACO/SA and ACO/GA are implemented based on the available open-source software packages of Ant-Miner classifier: https://sourceforge.net/projects/guiantminer. Saian and Ku-Mahamud (2012) combined ACO with SA, each ant discovers one best rule according to the hybrid ACO with SA algorithm. The SA algorithm can find an optimal solution for local optimisation problems. It is dependent on a variable named temperature (Aarts et al., 1997). The proposed hybrid algorithm uses the iteration on the temperature variable by starting with a high value. Therefore, in the beginning, all rules have the same probability to be selected. Then, temperature reduces and all rules will have a chance to be selected as the best rule. In this manner, the algorithm can avoid the local optimisation problem of the solution. Thereafter, the best rule in the iteration will be selected and the best among all iterations will then be added into the rule set. A successful hybridisation between ACO and PSO algorithm has been achieved by (Holden and Freitas, 2008) that introduced a new hybrid ACO and PSO2 algorithm for the discovery of classification rules. This classification algorithm can directly cope with discrete and continuous attributes.

Parameter Setting
The values of Ant-Miner classifiers parameter are adopted from (Robu et al., 2015;López-Ibáñez et al., 2016;Raymer et al., 2000) to ensure a fair evaluation of the results, all classifiers use the same values. Table 2 shows the parameters values used in the experiment.

Experimental Results
This section compares the results of the ACO/GA classifier with those of related classifiers with different rule pruning procedures. These classifiers include the ACO/PSO2 and ACO/SA. Experiments on 12 datasets from the UCI repository are conducted for all classification algorithms. In the first evaluation method, Table 3 shows the experimental results of the average classification accuracy. The result presents the average classification accuracy and the numbers after the symbol '+/−' are standard deviations. For each dataset, the best result is written in bold. Table 3 shows that the ACO/GA is better than ACO/PSO2 in 11 datasets. The ACO/GA is better than ACO/SA in eight datasets. Among all classifiers, the ACO/GA achieves the highest results in eight datasets. The ACO/SA obtains the second-best performance in three datasets [i.e. Wisconsin Breast Cancer, Cleveland Heart disease and Vehicle Data], whereas the ACO/PSO2 obtains the best results in only one dataset. Table 4 shows that the ACO/GA has the lowest number of discovered rules in all datasets for 10fold cross-validation compared with ACO/PSO 2. The ACO/GA obtains the lowest results in 11 datasets compared with the ACO/SA. Table 5 shows that the ACO/GA achieves the best results for model size in 11 datasets compared with the ACO/PSO 2. By using the same token, the ACO/GA achieves the best results in 11 datasets compared with the ACO/SA. Furthermore, the results obtained by the ACO/GA outperform other hybrid classifiers in all benchmark scenarios. This result is due to the enhancement by using the concepts of multi-neighbourhood structure in GA (i.e., crossover and mutation) to overcome the local optima and find the best classification rules from the dataset. Table 6 and Fig. 5 and 6 show the results of the nonparametric Friedman test with Holm's post-hoc test in the second scenario. This scenario determines the average classification accuracy rank, average number of discovered rule rank and average model size rank of the statistical results, which are reported in Table 6, across the 12 datasets. Figure 5 displays the results of the average classification accuracy rank versus the average number of rule rank. Figure 6 presents the results of the average classification accuracy rank versus the average model size rank. In all cases, the lowest rank indicates a good algorithm performance.     Figure 5 and 6 show that the results obtained by our proposed ACO/GA classifier dominate the other two classifiers. Therefore, the ACO/GA dominates the hybridisation with Ant-Miner classifier in all evaluation criteria. This result is due to the enhancement process achieved by the GA algorithm to the classification rule discovered by the Ant-Miner classifier. GA uses the multiple neighbourhood structures (i.e., crossover and mutation) procedures to escape from local minima.

Conclusion
This study proposes a hybrid Ant-Miner classification algorithm with GA (ACO/GA) classifier. The proposed classifier improves the classification accuracy, the number of discovered rules and classification complexity. The intensification of GA is used to improve the local exploitation of Ant-Miner search. GA uses three components, that is, (i) rule mutation to create a new starting rule from the best-found rules; (ii) crossover to find the best improvement in neighborhood structures; and (iii) acceptance criterion to accept or reject the current accuracy improvement of the generated rule. This type of local search assists the Ant-Miner to enhance the exploitation by focusing on promising areas of the search space. The classification performance of the proposed classifier is tested against other hybridization ant-mining classifiers (i.e., ACO/SA and ACO/PSO2) by using 10fold cross validation on 12 datasets from UCI. The results of the proposed classifier outperform the other classifiers in all evaluation criteria. In future work, the application of this classifier can be observed by realworld applications in various domains, such as DNA sequence classification, medical diagnosis, credit scoring and text mining. These real-life applications can provide extensive knowledge in the behavior and performance of the proposed classifier. Another research direction is that other stochastic local search algorithms (e.g., randomized iterative improvement, iterated greedy and evolutionary algorithms) can be hybridised with Ant-Miner algorithms.