Adaptive Acceptance Criterion (AAC) Algorithm for Optimization Problems

: Optimization methods commonly are designed for solving the optimization problems. Local search algorithms are optimization method, which are good candidate in exploiting the search space. However, most of them need parameter tuning and incapable of escaping from local optima. This work proposes non-parametric Acceptance Criterion (AC) that not relies on user-defined, which motivate to propose an Adaptive Acceptance Criterion (AAC). AC accepts a little worse solution based on comparing the candidate and best solutions found values to a stored value. The value is stored based on the lowest value of comparing the candidate and best solution found, when a new best solution found. AAC adaptively escape from local optima by employing a similar diversification idea of a previous proposed (ARDA) algorithm. In AAC, an estimated value added to the threshold (when the search is idle) to increase the search exploration. The estimated value is generated based on the frequency of the solutions quality differences, which are stored in an array. The progress of the search diversity is governed by the stored value. Six medical benchmark datasets for clustering problem (which are available in UCI Machine Learning Repository) and eleven benchmark datasets for university course timetabling problems (Socha benchmark datasets) are used as test domains. In order to evaluate the effectiveness of the propose AAC, comparison made between AC, AAC and other approaches drawn from the scientific literature. Results indicate that, AAC algorithm is able to produce good quality solutions which are comparable to other approaches in the literature.


Introduction
In optimization problems, there are various approaches inspired from a number of scientific disciplines like artificial intelligence, computational intelligence and operations research (Tripathy, 1980) in solving different optimization problems (i.e., scheduling and clustering). These approaches are classified into several categorizes (i.e. sequential, cluster, generalized search (meta-heuristic) and constraint-based methods) (Carter and Laporte, 1998), which re-categorized in different categorizes (Lewis, 2008).
However, some of these algorithms are capable of producing good quality solutions and some perform poorly. Moreover, they need investigation to tune their parameters. Meanwhile, the successful methods usually maintain adaptive criterion, intelligent neigh borhoods selection and hybridization.
In recent years, the researchers investigate the approaches toward enhancing the performance of the previous methods in the literature by hybridizing different acceptance criteria, which produce a complex methods for solving the optimization problems and needs more parameter tuning. The complexity of the problems needs to finding simple methods which employs an acceptance criterion that not relies on userdefined (non-parameterized acceptance criterion), or find a mechanism that intensify and diversify the search to produce good quality solutions and may adaptively get out from local optima.
Local Search (LS) Method is a simple meta-heuristics approaches widely used for solving the optimization problems by searching from current solution to its neighbor solutions (Pirlot, 1996). LS use the single based for solving computationally hard optimization problems. LS can be used for maximizing or minimizing the solution quality . LS consist of three important entities, the search space, neighborhood relation and the objective function (which evaluates the solution quality) .
The strength of the LS algorithm is in exploiting the search space (or the intensification process) (Ayvaz et al., 2012). However, the disadvantage of the LS approaches (Ayvaz et al., 2012) and the descent heuristic techniques  is that they are incapable of escaping local optima (minima), which is the strength of the population based approaches (Ayvaz et al., 2012). However, the disadvantage of the LS approaches is referring to their acceptance criterions, which not employ an adaptive mechanism to escape from local optima and almost of them needs to tune their parameters. Therefore, in this study, non-parametric Acceptance Criterion (AC) is proposed for optimization problems to overcome the disadvantage of the LS h approaches in tuning the parameters, which motivates to increase the diversification strategy (or exploring the search space) by proposing an Adaptive Acceptance Criterion (AAC) to overcome the disadvantage of the LS approaches in their incapability of escaping from the local optima. AC acceptance criterion relies on the comparison between the candidate and best solutions with stored value. The stored value generated based on an improvement on the best solution. AAC employ a similar idea of the diversification in a previous proposed algorithm, which add an estimated value to the threshold (when the search idle) to increase the diversification. The estimated value is generated based on the repetition of the solutions quality differences that are stored in an array.
The aim of this work is to propose an Adaptive Acceptance Criterion (AAC) for better diversification strategy for the optimization problems in order to produce a good solution quality.
Two difference test domains are used (i.e. clustering and timetabling problems). The first test domain is a medical clustering problem, which is partitioning a set of objects into a number of clusters of similar characteristic (Brucker, 1978). A cluster is a collection of similar objects and dissimilar to the other objects in other clusters (Brucker, 1978). The similarity of a cluster is classified based on certain object function (or distance function) (Saha et al., 2010). Partitioning a set of objects into two or more clusters is an NP-hard problem for it is difficulty in finding an optimal partition in reasonable time (Dasgupta and Freund, 2009). Finding a good clustering partitions or near to optimal are depends on the problem representation and the methodology to partitions the clusters during the search (Jain et al., 1999). The methodologies (or algorithms) are used to generate the initial clusters partitions. The initial cluster partitions quality (which termed as the minimal distance value) is calculated by using a distance function. In clustering problem, there are a numerous approaches used to generate the initial cluster partitions (or minimal distance value) (Holland, 1975) such as, Multi K-Means algorithm (Davidson and Satyanarayana, 2003), Fuzzy C-Means algorithm (Hong, 2006) and others (Berry and Linoff, 1997). Then, the minimal distance value iteratively improve by any algorithm (such as local search algorithms) to produce good quality clusters. The second test domain is the university course timetabling problems, which involves assigning a set of courses (events) and students to a fixed number of rooms and timeslots subject to a variety of constraints (Petrovic and Burke, 2004). Constraints in a timetabling problem can be classified as hard and soft constraints (Petrovic and Burke, 2004). The goal of solving timetabling problems is to satisfy all the hard constraints and attempt to accommodate the soft constraints as much as possible (in order to produce a good-quality timetable). All hard constraints must be satisfied in order to obtain a feasible timetable, whilst soft constraints can be accommodate and violated if necessary, where each violated constraint is penalized. The smaller value of penalizing overall penalty values is a better quality of the timetable. University course timetabling problems have been classified as an NP-hard problem; therefore it is difficult (in general) to find an optimal solution (for larger size instances) in a reasonable time . Finding good quality solutions to these problems be subject to the approach used and the problem representation employed during the search .
In recent years there are several approaches (or algorithms) used in both clustering and university timetabling problems to improve the solution quality. However, both Simulated Annealing (SA) Abuhamdah and Ayob, 2009) and Great Deluge algorithm (GD) (Abuhamdah and Ayob, 2009;Abuhamdah, 2012) are applied in the both test domain (i.e. medical clustering and university course timetabling problems), in addition that, they are LS methods widely used for their good performance. Therefore, in order to evaluate the performance of AAC algorithm, six benchmark datasets for medical clustering problems (which available in UCI Machine Learning Repository) and eleven benchmark datasets for university course timetabling (Socha benchmark datasets) are used, to compare the performance between AC, AAC, SA, GD and other approaches drawn from the scientific literature. Results demonstrate that AAC is able to produce statistically significantly higher quality solutions, outperforming many other LS approaches like SA and GD and obtain good quality solutions with other approaches on both domains (the medical clustering and course timetabling) datasets performances in line with other researchers. This paper is structured as follows: the problem description is presented in section 2. Section 3 describes the methodology. AAC algorithm is presented in section 4. section 5 discusses the experimental results. Finally, section 6 presents the conclusion.

Problem Description
In this study, two problem domains are used. The first problem domain is a medical clustering problem described in section 2.1, while the second problem domain is university course timetabling problem described in section 2.2.

Medical Clustering Problem
In this problem domain, six benchmark datasets are used tackle the medical clustering problems that denoted for research purpose, which available in UCI machine learning repository (http://archive.ics.uci.edu/ml/index.html). These datasets are information about the diseases and were collected from real infected patients. These datasets are chosen with difference number of patterns and different complexity as summarized in Table 1. Each dataset is available with a fixed number of clusters for research purpose. Note that, Dataset 1 is Haberman's Survival Database (H.S), Dataset 2 is BUPA Liver Disorders Database (B.L.D), Dataset 3 is Pima Indian Diabetes Database (P.I.D), Dataset 4 is Wisconsin Breast Cancer Database (B.C), Dataset 5 is Thyroid gland data Disease Database (T.D) and Dataset 6 is Lung Cancer Database (L.C). For example in Table 1, the dataset number 6 is Lung Cancer Database (L.C) have 56 attributes with 32 integer instances and categorized as three clusters in the initial clusters partition, the first cluster takes 9 instances, the second cluster takes 13 instances and the third cluster takes 10 instances.
The initial cluster quality for each dataset can be evaluated by using a distance function to calculate the minimal distance value. The distance function value can evaluate the algorithm performance (Maulik and Bandyopadhyay, 2000). Where, the smallest value indicates better clusters quality (or minimal distance value).
Euclidean Distance as illustrated in Equation 1 (Wang, 2007) is a distance function widely used for calculating the minimal distance value, where it performs well when the clusters are isolated and compact (Zhang, 2001). For example, assume there is a dataset X = {x1, x2,… xn} with n objects and we need to cluster it into K number of clusters, where i and j are two of ndimensional data objects: In this study, there are two different ways for calculating the minimal distance values are used, as follows:

Between Objects Distance Function Value
In this calculation, the minimal distance value is calculated based on distance between each data pattern and the pattern next to it. The idea of this calculation is to minimize the distance between the data patterns themselves in the same cluster (Wang, 2007).

Between Centers Distance Function Value
In this calculation, the minimal distance value is calculated based on the distance between each data pattern and their cluster center that it belongs to it (Wang, 2007), where a new center is calculated. The idea of this calculation is minimize the distance between the data patterns and their centers.

University Course Timetabling Problem
In this problem domain, eleven standard benchmark datasets were introduced by Socha et al. (2002) are used, which seek to optimize the students' satisfaction for the university course timetabling problem. The problem consists of: • A set of Rooms R in which events can take place • A set of Events (courses) E to be scheduled in 45 timeslots (5 days of 9 hours each and one hour for each timeslot) • A set of features F characterize the rooms • A set of Students S who attend the events These datasets are categorized into three groups: small (i.e. small 1, small 2, small 3, small 4 and small 5), medium (i.e. medium 1, medium 2, medium 3, medium 4 and medium 5) and large (large) datasets (see Table 2 for more detailed description). Table 2 also shows the number of students, events, rooms and features as well as the conflict density (CD) for each dataset (representing the complexity), an approximation of the number of students enrolled in each event (Students/ Events) and an approximation of the number of available rooms for each event (Rooms/Events) which are calculated as in (Chiarandini et al., 2006).

Hard Constraints
H1: No student attends more than one event at the same time H2: The room has to be large enough for all the attending students and has all the features required by the event H3: Only one event takes place in each room in any timeslot Soft Constraints S1: A student should not have a class in the last timeslot of the day. S2: A student should not have more than two classes consecutively. S3: A student should not have a single class on a day. The timetable quality is measured based on the number of the soft constraint violations (penalty cost). Each violation of a soft constraint will be penalized '1' for each student who is involved in this situation . All hard constraints must be satisfied since we only deal with feasible solutions, which is usually the case for the majority of research in this domain.

Methodology
In this study, non-parametric Acceptance Criterion (AC) algorithm is propose, which motivate to propose an Adaptive Acceptance Criterion (AAC) algorithm for the optimization problems. AC and AAC starts with generating an initial solution (or initial clusters partition) and iteratively explores its neighbor solutions (other solution), looking for a better one by any algorithm. The neighbor solution is accomplished by restructure the current solution (or partition) using some neighborhood structures. The initial solution for the medical clustering problem is presented in section 3.1 and the neighborhood structures are described in 3.1.1. Where, the initial solution for the university course timetabling problem is presented in section 3.2 and the neighborhood structures are described in 3.2.1.

Initial Solution for Medical Clustering Problem
In this study, Multi K-Meansalgorithm (Holland, 1975) is used as in Abuhamdah, 2012) to generate the initial solution partition for medical clustering problem. Multi K-Means algorithm structure is similar to K-Meansalgorithm structure (which is well known by its simplicity to deal with a huge amount of data patterns). K-Means aims to minimize the squared error established from Euclidean distance (Equation (1)), where the algorithm takes X as input parameters and partitions the set of n objects into K clusters. The basic difference between them is that, in K-Means a random cluster centers (centroids) is defined, whilst in Multi K-Means, we initially define a random cluster centers (centroids), then the final cluster centers is determined after the K-Means is restarted for 50 times as recommended (Holland, 1975) by re-computing the centroids vj of cluster j as illustrated in Equation 2:

Neighborhood Structures for Medical Clustering Problem
Two neighborhood structures (i.e., N1and N2) are used (which have been widely used in the literature) as in Abuhamdah, 2012). The neighborhood structures are: • N1: Randomly select one pattern from each cluster to swap their data with other pattern in other clusters. • N2: Randomly select two different patterns from the same cluster and swap their data.

Initial Solution for University Course Timetabling Problem
In this study, the initial solution generated by a constructive heuristic that was proposed in  for the course timetabling problems. There are three phases in the constructive heuristic as follow: largest degree heuristic (Landa-Silva and Obit, 2008), neighborhood search and tabu search. The constructive heuristic starts with an empty timetable and consecutively invokes three phases to generate a feasible timetable.
In the first phase, all unscheduled courses are sorted depend on the number of students conflict with other courses. Then, the student who has the highest number of conflicts compared to the other selected courses is selected first. The selected course may assign to any random feasible timeslot-room. However, if cannot find a feasible room for this course, it will be assigned to any room. If all the courses have been scheduled to feasible timeslot-rooms, we ignore phases 2 and 3. Otherwise, phases 2 and 3 are invoked to achieve the feasibility.
Phase 2, employs a simple decent algorithm to reduce the hard constraint violations. The neighborhood solution is generated by either moving one course from its current timeslot-room into another random timeslot-room, or it randomly selects two courses and swaps their rooms and timeslots. In both cases, the new solution is accepted if the move does not violate any hard constraints and the quality of the generated timetable is better than the previous solution quality in terms of hard constraints violation. Phase 2 is terminated after ten non-improving iterations. If the solution is feasible, we ignore phase 3, otherwise, phase 3 is invoked.
Phase 3, employs a tabu search algorithm that explores neighboring solutions in a similar way to phase 2, but it also maintains a tabu list to prevent certain moves being made for a certain number of iterations. The size of the tabu list is calculated by tl = rand(10)+δ * nc, where rand(10) is a random number between 0 and 10, nc is the number of events that violate the hard constraints and δ is a constant which is set to 0.6 . This phase will stop after 1000 non-improving iterations. If the generated solution is infeasible, re-call the constructive heuristic to generate a new solution from scratch until a feasible solution is found.

Neighborhood Structures for University Course Timetabling Problem
Two neighborhood structures (i.e., NS1and NS2) are used (which have been widely used in the literature) as in (Abuhamdah and Ayob, 2009

AAC Algorithm
This work is motivated by the strength of local based algorithms in exploiting the search space (intensification process) (Ayvaz et al., 2012). However, the disadvantage of the local search approaches (Ayvaz et al., 2012) and the descent heuristic techniques  is in tuning the parameters and incapable of escaping from local optima. Therefore, in this study, a non-parametric Acceptance Criterion (AC) algorithm is proposed to intensify the search and overcome the limitation of the parameter tuning, which motivates to propose an Adaptive Acceptance Criterion (AAC) algorithm to overcome the other limitation by increase the search exploration (or the diversification mechanism). Note that, the discussion bellow is for minimization on the clustering problem domain, which is similar to the minimization process of the university course timetabling problem domain (except in the initial solution, neighborhood structures and the terms).
AC is a simple mechanism based on stored value that may able to control the diversification with good quality results and produce a consistent result for different problems. AC starts with a given Multi K-Means partitions i.e., the initial solution (S initial ) is generated by Multi K-Means algorithm and then iteratively improve the constructed solution by generating a neighbor solution (candidate solution) by using neighborhood structure(s). AC always accept the generated candidate (new) solution (S working ) if the quality is better (less) than the best solution (S Arrange ) value, or can probably accept a little worse solution by an adaptive acceptance criterion (AC objective function). AC Acceptance Criterion (AC) as illustrated in Equation 3, adaptively accepts the worse solution if AC value is greater than or equal to the stored value (SV, which is stored based or the best and candidate solutions), otherwise S working will be rejected. This process will be repeated until the stopping condition is met: SV value is initially stored when a new best solution found using AC acceptance criterion (see Equation (3)) and later when there is any new best solution found then, SV will be updated with the lowest value using AC acceptance criterion (which control the diversification with a little worse solution). For example, we initially calculate SV by the first improved value using AC criterion. Later on, when there is any enhancement for best solution found S Arrange , we re-compute AC for the next iterations, however, if the computed AC is lowest than the SV value, then SV will be updated with AC value and so on. The process of updating SV is governed to increase a little diversification based on the solution improvement. In other words, when SV value is smaller or equal to AC value, then the candidate solution accepted for the next iteration. Please note that, in case of accepted solutions by the best solution found, then in Equation 3, we need to switch between solutions (i.e., AC= S working / S Arrange ) as the S Arrange is lowest than the new best solution S working . However, initialize SV value can be done in two ways, the first based on the first improvement as discussed and the second, we can maximize the problem for some iteration to identify SV. Figure 1 show the pseudo code for the AC approach, where the lists of notations that are used in Fig. 1 (AC algorithm) is for clustering problem. Table 3 differentiate between the notation used in the clustering problem domain and the notation used in university course timetabling problem domain. Figure 1 shows that, AC starts by initializing SV equal to zero and the only required parameter setting, the stopping condition (N iterations ), where the initial solution is generated using Multi K-Means (S initial ) as in Step-1. In the improvement phase (Step-2), we generate some candidate solutions (in this case, five candidate solutions are generated) as in Abuhamdah, 2012) for each neighborhood structure (i.e., N1 and N2) and the best candidate solution is selected as the candidate solution (S working ) as in Step-2.1. Later on, there are two cases to evaluate the candidate solution as follows:

Good Solution
If f(S working ) is better than f(S Arrange ), then we update SV value by the computing AC (i.e. AC=S working / S Arrange ) value in case of SV equal to zero or in case of SV is greater than AC, otherwise SV value not updated. After, S working is accepted as a current solution (i.e. S source ← S working ) and the best solution is updated (i.e. S Arrange ← S working ) as in Step-2.2. Note that, we have switch the solutions in Equation 3 as S working is a better solution than S Arrange.

Little Worse or Bad Solution
The quality between S working and S Arrange is compared by computing AC value as in Equation 3, (i.e., AC= S Arrange / S working )) in case if SV initialized with a value (not equal to zero). If AC is greater than or equal to SV then S working is accepted and the current solution is updated (i.e. S source ← S working ). Otherwise, S working will be rejected. Otherwise, the process of the step 2 continues until the stopping condition is met (i.e. Iterations> N iterations ). However, if the stopping condition is met, then the termination phase (Step-3) is active to return the best solution found S Arrange (or the best minimal distance).
However, the preliminary experiments on AC algorithm idea show that, AC algorithm can easily trapped in local optima when we are dealing with small datasets with zero optimal solution. For example, when S Arrange is equal to 2 and S working is equal to 1 in case of better solution, then AC value is 0.5 and SV is 0.5, where in the next iteration the candidate solution will not accepted as a little worse solution if it is greater than 2 (as the new S Arrange is 1 and SV is 0.5), but sometimes more worse solution can improve more, which motivates to increase the diversification strategy in AC. There are many strategy used for diversification, however, the Adaptive Randomized Descent Algorithm (ARDA) which was proposed by  employ an adaptive mechanism proposed for their acceptance criterion in similar optimization (or minimization) problem which allow some slightly worse solution to be accepted and helps to escape from a local optima. The idea in ARDA motivates to utilize it in AC and termed as AAC algorithm.
ARDA mechanism can adaptively attempt to escape from local optima by intelligently updating the threshold value when the search traps in local optima. This is done by estimating an appropriate threshold value based on the search history. ARDA mechanism based on array (L EV ) of estimated values EV with their frequencies.
These estimated values stored based on the difference between the new and old solutions found (i.e., EV= f(S working ) -f(S Arrange )), where when a new EV value found and it is different with the other EV values in the array, is then the array updated by adding the new EV value with frequency (or the number of repetition) is equal to one. In each improvement on the best solution, the new EV value is added and if the EV value is already in the array, then we add one to the frequency of EV value. However, in each time we update the array, then we rearrange the array in descending order based on their frequency value, in which the first value in the array is the value with the highest frequency value. ARDA mechanism starts when a counter (C idle-iterations ) of the idle improvement rate (idle-iterations) is met, then update the threshold value as their acceptance criterion by adding the value of the highest frequency value to their threshold value. However, AC algorithm threshold value is based on the solutions quality, therefore, we use ARDA mechanism to add the value for the best solution found f(S Arrange ) in AC acceptance criterion and termed as AAC algorithm. For example, when S Arrange is equal to 2 and S working is equal to 1 in case of better solution, then AC value is 0.5 and SV is 0.5, where in the next iteration the candidate solution will not accept the S working solution if it is greater than 2 (as the new S Arrange is 1 and SV is 0.5). However, if there the idle of improvement rates idle-iterations is met, then ARDA mechanism works to add EV value (e.g. EV is equal to 2) of the highest frequency to the S Arrange solution for increase the diversification in case AC calculation for the worse solution, in which AC value will be equal to 1.5 (i.e., AC= (1+2) / 2 )). Note that, the case discussed above is in case of the near optimal solution, where if it is not near optimal, so adding EV will increase little diversification. Figure 2 shows the pseudo code for the extensions of AC approach (AAC), where the combination between Fig. 1 and 2 and eliminating the duplication illustrate AAC algorithm.
As in Fig.2, AAC starts by initializing the idle improvement iterations (idle-iterations), Estimated Value (EV) and the array (L EV ) of the estimated values equal to zero.  Fig. 1 to the optimization problems In the improvement phase (Step-2), in case of the good solution accepted, set the counter for the idle iterations C idle-iterations equal to zero, we calculate the estimated value (i.e. EV= f(S Arrange )-f(S working )), updates the frequency of EV value in L EV or we added EV to L EV if it does not exist, rearrange the array L EV values in descending order based on their frequency or if the values are equivalence then rearrange the equivalence values in ascending order, then S working is accepted as a current solution (i.e. S source ← S working ) and the best solution is updated (i.e., S Arrange ← S working ) as in Step-2.2. While, in case of little worse or bad solution accepted, nothing changed except if the worse solution rejected, then the counter for the idle iterations (C idle-iterations ) is added by one. Finally, in the additional idle iterations phase (Step-3), the counter for the idle iterations (C idle-iterations ) is compared to the maximum number of idle iterations (idle-iterations) and if it is greater or equal (i.e., C idleiterations >= idle-iterations), then EV is generated by the first value of L EV (as it has the highest frequency value), rotate left all elements in L EV to use the next value in the next idle phase in case of non-improvement, then recomputed AC by adding EV to the best solution found (i.e. AC=S working / (S Arrange + EV)) and if accepted set the counter of the idle iterations to zero. Otherwise, the process continues as it is illustrated in Fig. 1.
However, the limitation of AAC algorithm, that we added a new parameter (i.e. idle iterations, which is equal to 10 as in , which need investigation for each problem, but it consider acceptable as it is the only parameter for the proposed AAC.

Results
In this study, AC and AAC algorithms are run 20 times as in Abuhamdah, 2012) for, medical clustering problem by using 6 datasets that are available in the UCI machine learning repository (http://archive.ics.uci.edu/ml/index.html). Also, AC and AAC algorithms are run 11 times as in (Abuhamdah and Ayob, 2009) for, university course timetabling problem by using 11 datasets (Socha benchmark datasets). In both problem domains, the algorithms run on a PC with an Intel dual core 1.8 MHz, 2 GB RAM and were programmed for clustering using Java language as in Abuhamdah, 2012) and for course timetabling using Matlab as in (Abuhamdah and Ayob, 2009). There is only one parameter is used in the AC algorithms for the stopping conditions (N iterations ) which is equal to 100,000 iterations for clustering as in SA with prolonging the search and termed as IISA  and GD (Abuhamdah, 2012), where for course timetabling is equal to 200,000 iterations as in SA and GD (Abuhamdah and Ayob, 2009). In addition, there is another parameter is used in AAC for the number of idle iterations (idle-iterations) which is equal to 600 for clustering as in (Abuhamdah, 2012) and 10 for timetabling as in . Table 4 shows the quality of the initial solution (or clusters partitions) obtained by Multi K-Means algorithm for each dataset using two calculations ways of the 6 datasets. Where, Table 5 shows the quality of the initial solution (or timetable) obtained by constructive heuristics Landa-Silva and Obit, 2008) for each dataset of the 11 datasets.
In order to investigate the performance of AC and AAC algorithms, Table 6 shows the comparison between AC and AAC algorithms using the between objects calculation. Tables 6-14, illustrate the best minimal distance quality (fmin), the average score (favg) and the standard deviation (σ) for the 20 runs. In each table, the best results (fmin) are presented in bold.
Results in Table 6 indicates that, AAC algorithm is able to produce good quality solution outperformed AC algorithm solutions in all datasets referring to the best minimal distance fmin, the average score favg and the standard deviation σ (except in B.L.D, P.I.D and L.C datasets, AC algorithm obtained better standard deviation than AAC algorithm, in addition to the average score in L.C dataset). Table 7 shows the comparison between AAC, IISA and GD algorithms using the between objects calculation for the six medical clustering datasets. Table 7 shows that, AAC algorithm also is outperformed IISA and GD algorithms in all datasets referring to the best minimal distance fmin, the average score favg and the standard deviation σ (where the standard deviation for GD is not known). Table 8 shows the comparison between AAC approach and other local hybrid meta-heuristic searches in the literature using the between objects calculation.
According to Table 8, AAC algorithm is able to produce high quality solution outperformed IISA , MGD (Abuhamdah, 2012), ISA-MGD  and AGD (Abuhamdah et al., 2014) algorithms in all datasets referring to the fmin and favg (except in B.C dataset, MGD algorithm obtained same result with AAC algorithm). Figure 3 shows a 3D scatter graph for Multi K-Means, AC and AAC algorithms over H.S dataset using between objects calculation. H.S dataset has two clusters represented by two colors (red and green). Figure 3a show that, the initial minimal distance obtained by Multi K-Means is 2463.972. Where in Fig.  3b, the best minimal distance obtained by AAC algorithm is equal to 947.64. However, in Fig. 3; the best minimal distance obtained by AAC algorithm (i.e., 947.64) is slightly different from AC algorithm (i.e., 987.68) to show their differences, therefore AC algorithm graph not included.  Moreover, the (AC and AAC) algorithms investigated using the between centers calculation for the six medical clustering datasets as in Tables 9, 10 and 11. Table 9 shows the comparison between AC and AAC algorithms; Table 10 shows the comparison between AAC, IISA and GD algorithms; and Table 11 shows the comparison between AAC approach and other local hybrid meta-heuristic searches in the literature.

Partitions in between Objects Calculation
Tables 9, 10 and 11 shows the comparison using the calculation of between centers, where the observation (Table 9) indicates that, AAC algorithm is outperformed AC algorithm in all datasets referring to the fmin and favg (except in L.C dataset they obtained same fmin and the favg for B.L.D and L.C datasets in AC algorithm is better than AAC algorithm), where σ in AC is better than AAC all datasets (except in B.L.D dataset). In addition, the observation (in Table 10) shows that, AAC algorithm also is outperformed IISA and GD algorithms in all datasets referring to the fmin, favg and σ (where in L.C dataset MGD is obtain same fmin in addition that in H.S and T.D datasets the σ for IISA is better than AAC and the σ for GD is not known). Where the comparison in Table 11 with the other approaches in the literature shows that, AAC algorithm is able to produce high quality solution outperformed IISA , MGD (Abuhamdah, 2012) and ISA-MGD  and AGD (Abuhamdah et al., 2014) algorithms in all datasets referring to the fmin and favg (except in L.C dataset, MGD and AGD algorithms obtained same fmin).  Table 6. Results obtained by between objects calculation for AC and AAC algorithms out of 20 runs on six datasets  Table 7. Results obtained by between objects calculation for AAC, IISA  and GD (Abuhamdah, 2012) algorithms  Dataset  AAC  IISA  GD  AAC  IISA  GD  AAC  IISA Table 10. Results obtained by between centers calculation for AAC, IISA  and GD (Abuhamdah, 2012) algorithms out of 20 runs on six datasets   Figure 4a shows, the initial minimal distance obtained by Multi K-Means is 2463.972. Where in Fig. 4b, the best minimal distance obtained by AAC algorithm is equal to 947.64. Also, in Fig. 4; the best minimal distance obtained by AAC algorithm (i.e., 2703.11) is slightly different from AC algorithm (i.e., 2721.36) to show their differences, therefore AC algorithm graph not included. Furthermore, the (AC and AAC) algorithms investigated using the eleven university course timetabling problems as in Tables 12 to 14. Table 12 shows the comparison between AC and AAC algorithms; Table 13 shows the comparison between AAC, SA and GD algorithms; and Table 14 shows the comparison between AAC approach and other local hybrid meta-heuristic searches in the literature.

(σ) ---------------------------------------------------------------------------------------------------
According to the results in Table 12, AAC algorithm is able to produce good results outperformed AC algorithm in all Medium and large datasets, where they obtain same best result in all the small datasets, in addition that AAC standard deviation for all the small and medium 2 datasets is better than AC algorithm. The results in Table 12 also show that, AAC average scores are better than AC. Where the comparison in Table 13 indicates that, AAC outperformed SA and GD algorithms in all datasets; except in small 5 dataset they obtain the same best result, in addition that GD obtains same best result with AAC for small 1, 2 and 4 datasets. Table 13 also shows that SA and AAC obtained same σ, SA σ is better than AAC in Medium 5 dataset and GD σ is better than AAC in small 4 dataset. Table 14 shows that AAC algorithm is able to produce good quality solution are equivalence with some approaches (i.e., A1, A2…, etc) in the small datasets and comparable with other approaches (i.e., A1, A2,.., A26) in the literature in the other datasets. The best results for Medium 1, Medium 2 and Medium 3 datasets is obtained by Abuhamdah et al. (2013), and the best results for Medium 4 is obtained by Turabieh and Abdullah (2009), whilst, the best results for Medium 5 and large datasets is obtained by Turabieh et al. (2010).
However, all these result are obtained with population based algorithms which makes the percentage deviation (∆ (%)) of AAC algorithm between 0.56 and 1.16 is acceptable.    Table 13. Results for AAC, SA (Abuhamdah and Ayob, 2009) and GD (Abuhamdah, 2012) algorithms using the eleven datasets for course out of 11 runs on eleven datasets For example, if we consider an individual comparison with the results obtained in Medium 4 dataset, we can see that AAC algorithm rank is 9 over all 26 approaches with the percentage deviation of 0.96, where all the approaches that outperform AAC algorithm in medium 4 dataset are better for their structure such as a population based approaches or hybridization approaches or intelligent neighborhood selection, in addition that they employs many neighborhood structure, while AAC employ two neighborhoods structure. Where, in A3  is a hybridize mechanism for great deluge, A15 ), A18 (Al-Betar et al., 2010), A19 (Turabieh et al., 2010), A20 (Jaradat and Ayob, 2010) and A26 (Abuhamdah et al., 2013) are a population based algorithm and some of them hybrid approaches, in A24  it is also hybridization with adaptive mechanism and in A25 (Abuhamdah and Ayob, 2010) employ hybridization and systemic neighborhood selection. However, if we compare the results in Medium 2 dataset, then we can see that AAC ranked as 4 over all with the percentage deviation of 0.56. Where, AAC outperformed by A19 (Turabieh et al., 2010), A25  and A26 (Abuhamdah et al., 2013); and the reason behind that A19 and A26 are a population based approaches with hybridization and employs many neighborhood structure, where A25 is a hybridization with intelligent neighborhood selection. This poses a future work, to employ more neighborhood structures with intelligent selection, which may help in producing better solution and outperform other approaches. Figure 5 shows the box and whisker plot that summarize the results of 11 runs on Socha benchmark datasets (note that in the clustering problems results, the minimal distances difference are so high, therefore, AAC behaviour can be more understandable in timetabling problem).
In Fig. 5 the results for the small datasets are obtained between 283 to 1,299 seconds (for more details see Table  15). Meanwhile, the medium and large datasets ranged from 19,152 to 35,744 seconds.
In all datasets, we can see that the median is slightly closer to the best than to the worst of these runs. This indicates that the algorithm is stable and consistent most of the time and may produce very good quality solutions. The result also shows that AAC is capable of producing feasible solution for all datasets with high quality solutions that are comparable with the best-known results obtained in the literature.
For example of AAC behaviour on the medium 1 dataset (see Fig. 6). Figure 6 shows the correlations between the minimize number of iterations to 200 iterations and the solution quality (or penalty) to be more understandable for observing AAC algorithm performance.   (Abdullah et al., 2005. A2: . A3: . A4: . A5: (Ejaz and Javed, 2007). A6 (Asmuni et al., 2005). A7: (Abdullah and Turabieh, 2008). A8: . A9: . A10: (Landa-Silva and obit, 2008). A11: (Socha et al., 2002). A12: (Burke et al., 2003). A13: (Socha et al., 2003). A14: (Al-Betar et al., 2008). A15: . A16: (Landa-Silva and obit, 2009). A17: . A18: (Al-Betar et al., 2010). A19: (Turabieh et al., 2010). A20: (Jaradat and Ayob, 2010). A21: (Shaker and Abdullah, 2010). A22: (Abuhamdah and Ayob, 2009). A23: . A24: . A25: . A26: (Abuhamdah et al., 2013). The value for AAC and A1-A26 are the minimum penalty cost obtained by each approach. In Fig. 6 the curve slope shows that when the number of iterations increase, then the penalty cost improved. In the beginning of the search, we can see that the penalty cost can be quickly reduced when the worse solution accepted, which show a flexible acceptance criteria (i.e., AC). In addition that, accepting the worst solution in AAC is capable of escaping from local optima and acceptance a little worse solution from the beginning to the end of the curve became more smaller, which may help the search to produce a better quality solutions. For more understanding about AAC algorithm in producing good quality solutions, the following table illustrates the statistical analysis of applying AAC algorithm on the Socha benchmark datasets. The statistical readings are based on the following performance indicators: The best score (fmin), the total number of iteration moves (Iterations) for the best solution and the total CPU time on the computer needed to find the best solution fmin (Time/s). If there are multiple hits on the best solution in each independent run (11 runs for small, 11 runs for medium and 11 runs for the large instance), then the values listed in the table are the average over these multiple best hits (see Table 15). However, the average scores (favg) and the standard deviation σ for AAC algorithm is shown in Table 12.

Conclusion and Discussion
This work has proposed a non-parametric Acceptance Criterion (AC) that not relies on user-defined that is motivated by weakness of the local search algorithms in tuning the parameters. However, the other limitation of many local based algorithms is in exploring the search space (diversification strategy), which motivates to increase diversification strategy in AC by employing a similar idea of an adaptive mechanism previously proposed in ARDA algorithm to overcome the other limitation of the local based algorithm and termed as AAC algorithm.
In order to evaluate the AC and AAC algorithms performance, AC and AAC are tested on two problem domain for optimization, the first domain problem is six medical clustering benchmark dataset and the second is eleven benchmark datasets for university course timetabling problem. A comparison made between the performance of AC, AAC, SA, GD and other approaches in the literature. Results indicate that AC is a good acceptance criterion as it outperformed SA, GD in some datasets for both domains. Results also shows that, AAC outperform AC, SA and GD algorithms in both domain problems in most datasets, outperformed other methods in the literature for the clustering problem in most datasets; and obtain comparable results with other methods in the literature for the course timetabling problem and the results are generally statistically significant comparing to those methods. Thus we can conclude that AAC has more capability in intensifying and diversifying the search than AC. The limitation of AC approach is that we need to increase the diversification, where the limitation in AAC is to control the diversification intelligently with better mechanism of diversification or hybridize it with a good mechanism. The limitation in both AC and AAC algorithms is in the neighborhood selection, which needs to be selected intelligently and more investigation on increasing the number of neighborhood structure for its role in the results quality.

Ethics
This article is original and contains unpublished material. The corresponding author confirms that all of the other authors have read and approved the manuscript and no ethical issues involved.