Scheduling Jobs through Gap Filling and Optimization Techniques in Computational Grid

: Due to the heterogeneity and complexity in grid computing, classical algorithms may not be able to deal with dynamic jobs properly. In the dynamic mode, incoming jobs reach the scheduler arbitrary. Therefore, scheduling the jobs using simple policy alone deteriorates the performance of the scheduler. Thus, a policy that can handle the dynamicity efficiently is indispensable. This paper presents the Swift Gap mechanism (SG), which is a hybridization of the Best Gap mechanism, alongside with Tabu search (BGT). In addition, a new decision rule based on completion time is included into the outcome mechanism. The new decision rule based on completion time has shown a significant improvement in the Quality of Service (QoS), especially for a slowdown, tardiness, waiting time and response time. Moreover, an evaluation of the new proposed mechanism Swift Gap is provided. From the evaluation, Swift Gap outperforms BGT, Conservative Backfilling (CONS) and Extensible Argonne Scheduling System (EASY).


Introduction
In the computational grid, scheduling can be static or dynamic. The scheduling in the static mode is simple and straightforward; all the jobs arrive at a certain known of time to the system. The resources are always available for the whole scheduling process. Whereas in the dynamic mode, the jobs arrive to the system in different lengths of time, the scheduler has no idea about the arrival time until job reaching to the system. Moreover, the availability of the resources is not guaranteed. Most frequently, the candidate resources may sign out/in to the system for various reasons (such as resource failure).
The static mode is predictable for the scheduler and easy to implement. A traditional scheduling algorithm (such as; FCFS) (Henderson, 1995) can perform well for a small grid computing system. On the other hand, if thousands of jobs (that reach in different times) are waiting to be allocated for many available resources, more advanced policy that can deal with these circumstances is highly required.
Priority-based algorithms have been applied widely for scheduling purpose (Dakkak et al., 2006). The main issue of priority-based algorithms is maintaining the balance of performance regarding different metrics.
Whereas; these algorithms can perform very well for specific metrics, while on the other hand they perform poorly compared to other aspects. For instance: Shortest Job First (SJF) (Davis and Patterson, 1975) has low flow-time and good utilization of the resources, but it has high makespan. Longest Job First (LJF) (Abraham et al., 2000) and Minimum Time to Due Date (MTTD) (Rasooli et al., 2008) have low makespan; conversely LJF suffers from high flow-time, high tardiness and low utilization of the resources.
In order to avoid these defects, several studies have implemented optimization algorithms such as: Ant Colony (Dorigo and Gambardella, 1997), Random Search (Solis and Wets, 1981) and Tabu Search (Glover and Laguna, 2013) alongside with priority algorithms. The previous applied approaches achieve better results and strikes a balance among the objective functions related to the end user. In addition, backfilling techniques were used for further exploiting of the available resources. Earliest Gap and Conservative Backfilling are examples of Backfilling techniques. Unlike the earliest gap policy (Klusácek and Rudová, 2008) which has to work alongside with basedscheduling algorithms, best-gap is able to perform alone. While Best Gap inherits the ability to reschedule the new arrival jobs without building a new scheduler from EG and thus, Best Gap saving computational time compared to the previous mechanisms.
Even though Best Gap has the ability to build an incremental scheduler based on the new and existing jobs in the queue, but still scheduling thousands of jobs for numerous number of machines (resources), is not a smooth process and it consumes significant time at the expense of the QoS. Thus; a Meta-Heuristic mechanism that has the ability to optimize the scheduling is required in an environment such as in the grid. This paper presents an efficient integration between Best Gap and Tabu Search. Moreover, a new decision rule based on the Completion Time Weight is included. The outcome mechanism is named Swift Gap. Swift Gap mechanism has the ability to reduce the scheduling process by minimizing the objective functions with respect to Slowdown, Tardiness, Waiting Time and Response Time. Thus, Swift Gap mechanism has a remarkable improvement of QoS.
The rest of the paper is organized as follows: Section 2 introduces some of the related works. Section 3 describes Swift Gap structure. Section 4 explains the applied approach and the simulation settings. Section 5demonstrates the evaluation process and result analysis. Finally, this paper is concluded in section 6.

Related Work
The Backfilling technique refers to approach which avoids the fragmentation that caused by the small gaps among the jobs in the queue (or scheduler). These small gaps affect the utilization of resources by increasing the idle CPU time. EASY (Lifka, 1995) is the first mechanism that was developed to tackle this issue. EASY used First Come First Serve (FCFS) as a scheduling mechanism, in order to perform the scheduling. The main idea is to move the small job that can fit in the gap to an existing gap without affecting the first job that that located in the top of the queue.
Some researchers have implemented other priorityrules mechanisms (such as; SJF) with Backfilling technique. In (Tsafrir et al., 2007), Shortest Job First with Backfilling (SJFBF) was introduced. The main idea is to use EASY scheme to guarantee that no backfilled job will affect the first job at the top of the queue, while SJF will be applied for the jobs that will be backfilled.
In (Klusácek and Rudová, 2008), Earliest Gap-Earliest Deadline First (EG-EDF) was introduced followed by Tabu Search algorithm for further optimization. This strategy has a better performance from the previous ones due to the applied evaluation steps. In the schedule, when the short job arrives and if a gap is detected, the job will be moved into that gap using EG. If no gap detected, EDF policy will be applied alone. When the dispatching rule has to be decided to which machine the job should be allocated to, a Tabu search is implemented to calculate the possible job movement. Before moving the job, an evaluation based on the makespan and number of delayed jobs is applied. This evaluation helps to reduce the number of the moves. If the proposed move was better than the current one, the job will be moved, otherwise it will be maintained in the current position.
Other studies have implemented optimization search with/without priority search algorithm. (Somasundaram and Radhakrishnan, 2009) has implemented SJF with weighted random search. This mechanism achieved low residing time, but it suffers from high makespan. Integration between Ant Colony and Max-Min system in order to reduce the total computational time was proposed by (Nasira and Ku-Mahamudb, 2009).

Swift Gap Description
Swift Gap structure is based on CONS (Alem and Feitelson, 2001). In EASY, the backfilling is aggressive (i.e., EASY will check only if the backfilling will delay the first job in the queue). This could lead to long waiting time for the other jobs that are waiting in the queue. In conservative approach, the small jobs will be backfilled without causing any delay for the rest of jobs ahead in the schedule. This is achieved due to the runtime estimates. In this approach, the runtime estimates role (which is provided by the user) is very crucial to predict when each job will start and finish. Thus, the system will have the ability to expect the running time for the scheduled jobs and subsequently, the backfilling will be executed wisely. Whereas, EASY utilizes the runtime estimates. However, due to the aggressive approach to utilize the machine(s) more, EASY performs the backfilling considering only the delay for the first job in the queue.
The structure of Swift Gap consists of two kind of data. The first one includes the list of queued jobs in the system and the expected time to start. While the other structure maintains information about the expected resources usage in the future. In order to allocate the newly arriving jobs, an anchor point has to be detected. The anchor point is the point where the resources are available to tackle the backfilled jobs from start to finish without affecting any jobs in the schedule ahead. This information keeps updating itself every while to include/exclude the resources in/out the next backfilling cycle.
In reality, the runtime estimates provided by the users are so far from being accurate. Short runtime estimation will terminate the jobs, whereas long runtime estimation will cause long waiting time for the jobs. To solve this problem, the runtime estimates will be exceeded to the maximum limit and then, the scheduler will compress itself to adapt the current situation. Moreover, the compression function helps to keep the original scheduler situation after backfilling decision is taken. This helps if a sudden termination of a job happened. In such a case, the job that ahead of the future backfilled job may take its place and that will lead to make the backfilled job waits for longer time in order to be backfilled again. Thus, the compression function exempts the scheduler from doing all backfilling calculations again by maintaining the scheduler situation as it is before the sudden termination of the running job happens. The only change occurs is the starting time and finishing time for the jobs. Therefore, in backfilling, all the jobs will start and finish earlier. Figure 1 illustrates the compressing function.

Swift Gap Approach
Our new proposed algorithm Swift Gap (Dakkak et al., 2016), adapting the decision rules form Best Gap policy, In addition, Swift Gap includes a new decision rule regarding the job movement in the schedule based on the completion time (compl time) (Gomoluch and Schroeder, 2003). The new decision rule (compl time) has shown a significant improvement in most of the QoS criteria for the end user. This paper will focus on four metrics, which are slowdown, tardiness, waiting time and response time.

A. Hybridization
There are several types of hybridization, which are: The high level and the low level. In the high level, hybridized algorithms are loosely coupled, whereas at the low level, the structures of algorithms are strongly coupled. Loosely coupled means when the first algorithm execution is over based on certain condition(s), the second algorithm starts in order to optimize the solution obtained from the first algorithm (Xhafa et al., 2011). In this study, the hybridization is based on loosely coupled concept. That means; both hybridized algorithms have their independent flow and commands i.e., no interaction between the structures of algorithms. Swift Gap has two stopping conditions. The first stopping condition is related to the creating the initial scheduler. When all new arrival jobs are tested in order to find the best gap, the initial scheduler is done and the optimizing stage starts. The second one in the optimization stage, where the mechanism will stop once required iterations' number is achieved or the time of is over. Figure 2 shows the hybridization scheme for Swift Gap with completion time rule included.

B. Swift Gap Working Steps
The scheduling in Swift Gap starts based on the gaps that are available in the scheduler, sensing of the gap(s) existence is based on the fragmentation measurement among the jobs in the schedule. When the gap is detected, the size of the job will be measured to determine if the job can be moved to the detected gap or not. If the size of the job is smaller or equal to the detected gap, the job will be eligible for the evaluation stage, which will be described later on. For further enhancement, the initial scheduling has to be improved through optimization mechanism. This could be done by moving the suitable job among the clusters rather than moving it within the computing cluster only. The optimized solution will subject to the same evaluation stage that applied in initial solution stage. Moreover, the iteration ability that optimization mechanism has; will enable Swift Gap to find the most optimized solution for the scheduling problem.
When Swift Gap has to take a decision for moving the job or keeping the job where it is (for evaluation stage), a one out of two options that Swift Gap has to take using the weight_function. This selection is based on the preferable value of the calculated metric (compl time) for decision (A) or (B). The resolution is taken upon this formula: ((Ametric-Bmetric)/Ametric). If the result of this formula is > 0.0, then Bmetric will be selected since minimizing these metrics is the better option. In other words, if the weight_function is greater than zero, (B) scheduling decision will be selected. Otherwise, if the previous formula is less than zero, the scheduler will schedule the job based on (A). A division by zero is prevented in the code. This evaluation of the weight_function for job moving is applied to detect which position will lead to a better performance metrics.
In Swift Gap, (line 1-2): The resources is created in the system. (Line: 3-5): The algorithm searching for a gap in the schedule to move the job into. If there is no gap, the algorithm will keep searching until finding a gap. The job will be moved based on completion time rule (line: 6-7). The evaluation of the move will be tested using the weight_function. Then, the executing of initial scheduling is over. (line: 7-12). The optimization search algorithm starts performing by adding the jobs in the schedule to its list in order to optimize them later (line : 13-14). The optimization of the jobs is conducted by manipulating the job's position among the clusters (line: 15). The movement of the job will be evaluated with respect to completion time weight. If the weight_function >0.0, then the current move (B) will be selected. Otherwise, the move will be rejected and previous move (A) will be selected (line: [16][17][18][19][20][21]. Finally, the optimized order for jobs in the algorithm list will be sent to the allocated machine and the event will end (line: 22).
The weight_function is taking the scheduling decision depending on completion time (line: 1). After the calculation of the previous and current metrics is over, a discrimination between two scheduling decisions (A or B) will be executed. Decision (A) represents the previous move, whereas (B) decision represents the current move.
By reason of minimizing the metrics time in the Grid is highly desirable, the formula ((Ametric-Bmetric)/Ametric) determines which move is better (A) or (B). If ((Ametric-Bmetric)/Ametric)for the sum of all jobs is greater than zero, which means (A) value is greater than (B) value (line: 2-6). Therefore, the current move which is presented by (B) will be applied (line: 7), otherwise the previous move (presented by A) will take a place instead of (B) (line: 8-9). The event will end in (line: 10). Figure 3 presents the algorithm for Swift Gap and the weight_function.

Simulation and Discussion
The experiment is conducted using Alea Simulator (Klusáček and Rudová, 2010). Alea Simulator has a positive feedback from many researchers (Dakkak et al., 2015). The considered resource in our simulation is CPU. Two datasets are included in this experiment (Zewura and Wagap). The results reflect the real values for both dataset based on the jobs number. The number of the jobs is 3000, 5000, 7000, 9000, 10000, 15000 and 17500. The experiment is conducted using Intel I7-4770 CPU with 8 GB of RAM; the operating system is Windows 7.
Every simulation was repeated for 20 times. Figure 4-7 present the simulation results for slowdown, tardiness, waiting time and response time respectively for both datasets. Slowdown is a fraction (no unit), whereas tardiness, waiting time and response time are measured in seconds in the conducted simulation.
To evaluate the quality of the schedule offered by Swift Gap, different four criteria are used. The slowdown means how many times the job was delayed (ratio). The tardiness refers to the delay for the job related to certain due date. The waiting time refers to waiting time for the job has to wait in the schedule before it started to be processed. Response time is the running time for the job included to the waiting time for that job. Table 1 includes the parameters and configurations for the conducted simulation. In Fig. 4, the completion time rule will make Swift Gap with less slowdown than the other mechanisms. Since (Gomoluch and Schroeder, 2003 Reducing the completion time, will also result in reducing the start time. Since start time appears in the denominator, thus, the total value of slowdown will certainly be minimized.
In Fig. 5, the tardiness will be decreased if the completion time is minimized as well, since: While, due date time is fixed parameter in the workload, obviously decreasing the completion time will lead to minimize the total tardiness also. In Fig. 6, the waiting time in Swift Gap is less than BGT, CONS and EASY. The waiting time is the difference between the submission time and the start time. The start time is the time when the job is starting to be scheduled, whereas the submission time is the time when the job is submitted to the system. The reason behind this improvement, that the start time in Swift Gap is much less. Since (Gomoluch and Schroeder, 2003): Reducing the completion time, will automatically lead in reducing the start time. Finally, in Fig. 7, it can be observed that the response time in Swift Gap is less compared to the other simulated algorithms. Since the response time is the sum of running time and waiting time: The running time (Tr) for each job is a fixed value, while the waiting time (Tw) is already decreased as mentioned above, consequently, this will minimize the value of expected response time.
In the previous figures, it can be observed the variation in the interactions among the curve's patterns that produced by simulating the algorithms for zewura and wagap workloads. The reason behind this, that zewura workload has fewer resources than wagap workload. The presented completion time rule in Swift Gap has better effect when the number of the jobs is getting increases in wagap workload. This goes back to the huge resource number. This means that the completion time influence becomes less when the number of resources is huge for a limited number of the jobs. Table 2 Table 4. Obviously, Swift Gap outperform BGT, CONS and EASY in all objective functions for both workloads. Even though, the completion rule influence is not so obvious in wagap workload when the number or jobs is few, but later, when the number of the jobs is getting increased, the enhancement of completion time rule becomes very conspicuous and even much better than zewura. Whereas; in zewura workload, the role of the included completion time in Swift Gap can be observed once the simulation launches. This dissimilar behavior of Swift Gapand even for the other simulated algorithms between the different workloads can be justified based on the properties of the workload. As discussed earlier, the number of resources and the size of the jobs for each workload can be one of many reasons behind this dissimilar behavior.
Finally, it can be observed that all objective functions behaviors in the simulated dynamic case are not stable (nonlinear curve). That is due to the different arrival times for the jobs and different waiting times as well. Thus, the variation in the curves (oscillatory behavior) is observed regardless the number of jobs.

Conclusion
In this study, Swift Gap mechanism is presented. Swift Gap exploits the features of Best Gap and Tabu Search. In addition, a completion time rule has included into the mechanism. The completion time rule has proven its efficiency by minimizing the objective functions. The simulation is conducted using Alea Simulator. The experiments were carried out for two datasets for a huge jobs' number.
The simulations have shown that Swift Gap has minimized the slowdown, tardiness, waiting time and response time thanks to the completion time rule. Moreover and from the presented evaluation, Swift Gap outperforms Best Gap with Tabu Search (BGT), CONS and EASY in all simulated performance metrics.