Novel Adaptive Job Scheduling Algorithm on Heterogeneous Grid Resources

: Grid provides an infrastructure for sharing geographically distributed heterogeneous resources to process many applications and mainly used for solving scientific problems that requires more computation time. Problem statement: Grid is a dynamic environment, where the resources may join or leave the environment at any time and the jobs also arrives at different intervals of time. To meet the demands and requirements of the dynamic environment, to maximize the resource utilization and to minimize the makespan an effective grid scheduling technique is needed. Approach: We propose grid architecture as a collection of clusters with multiple worker nodes in each cluster. We propose a new scheduling algorithm Novel Adaptive Decentralized Job Scheduling Algorithm (NADJSA) that applies both Divisible Load Theory (DLT) and Least Cost Method (LCM) and also considers the user demands. Results: The proposed Novel Adaptive Decentralized Job Scheduling Algorithm is compared with the Decentralized Hybrid Job Scheduling Algorithm. Conclusion: The proposed Novel Adaptive Decentralized Job Scheduling Algorithm minimizes the makespan, improves the resource utilization and satisfies the user demands and well suits for the grid environment.


INTRODUCTION
Grid system consists of geographically distributed heterogeneous resources that belong to various administrative domains. The dynamics and heterogeneity nature of the grid environment makes the scheduling problem a challenging one. In general, job scheduling in heterogeneous grid environment is an NP-hard problem (Manavalasundaram and Duraiswamy, 2012).
Divisible loads are classified as arbitrarily divisible loads and modularly divisible loads. Modularly divisible loads are divided into predefined modules. Between the modules, precedence relations may exist. Arbitrarily divisible loads are divided into partitions of arbitrary lengths. No precedence relations exist between the loads and these arbitrarily divisible independent processing loads can be processed on more than one processor (Shokripour and Othman, 2009a).
Least Cost Method (LCM) allocates the job to the available resources in the grid with minimum processing cost.
The aim of the scheduling algorithm is to minimize the processing time of the job. Optimizing processing time of the job is done by dividing the jobs into sub jobs and allocating the sub jobs to the worker node of different clusters in a decentralized grid environment.
The proposed Novel Adaptive Decentralized Job Scheduling Algorithm employs the DLT and LCM and allocates the job efficiently to the available resources in the decentralized grid with minimum makespan and minimum processing cost.

Related research:
The powerful tool for efficient scheduling of computing loads is the divisible load theory. It is especially emerged for scheduling of parallel loads that are divisible among the processors and links. Divisible load theory is a linear mathematical model, the computing loads can be partitioned arbitrarily and can be executed in any order and it provides optimal processing of the computational loads (Shokripour and Othman, 2009b).
In LCM, the arbitrary divisible independent processing loads are allocated to the resource with the least allocation cost. Shah et al. (2010a), the job is divided into tasks of equal size and the task is allocated to the processor with the least allocation cost. If more than one processor has the same least allocation cost, then the processor with the maximum available processing unit is selected. The task to be scheduled on this processor is selected based on the processing time of the task. The task which has the maximum processing time is selected and scheduled on the processor. Shah et al. (2010b), the processor and the job with the least allocation cost are selected. If more than one processor has the same least allocation cost, then select the next least allocation cost for both the processor and the job. Among the processor that has the least cost allocation select the more processing unit available processor and select the job with maximum workload and allocate to that processor. Select the next least allocation cost for the minimum job workload. Each worker node has different processing powers. The grid environment is dynamic; the worker nodes can leave or join the grid at any time. Each worker node has its own availability time, the time at which the worker node is available in the grid. The worker nodes within the cluster are interconnected through a local area network. The clusters are connected through a wide area network. Grid Information Centre (GIC) maintains the CPU and memory utilization value of all the nodes in the gird. Coordinator nodes of each cluster provide this information to GIC periodically (Suri and Singh, 2010).
Each user Ui owns a cluster Ci. The set of users U is denoted as U = {U1, U2, ..., Uo}. The set of all jobs submitted by the user U of a cluster C is denoted by J. The set of jobs is denoted as J = {J1, J2, ...., Jk}. Each job is split into sub jobs as Ji = {SJi1, SJi2, ...., SJil}.
In a decentralized dynamic grid environment the scheduling of jobs is a linear programming transportation problem. An efficient novel approach is essential for scheduling of jobs originating from any cluster to any other cluster at minimum transportation cost. Scheduling also considers the various parameters like minimum makespan, minimum processing cost, availability time of the worker node, deadline of the job, transportation cost, the communication time to transfer the job submitted in one cluster to the other cluster for processing.

Existing research:
A divisible job Ji is divided into sub jobs to the maximum of five partitions. Let k be the number of jobs and q be the number of partitions of a job Bhaskaran, 2011a, 2011b The user submits the job to the coordinator node. The coordinator node contacts GIC and gets the information of memory and CPU utilization of each worker node in the grid and allocates the sub job to the node with the minimum processing time and processing cost.

Decentralized Hybrid Job Scheduling Algorithm (DHJSA):
Step1: If there is any completion time information from CN or WN then update the information at GIC or CN. Step2: If J is empty then go to step 9.
Step 3: If a job J i completes the execution of all its sub jobs and was migrated to another cluster then dispatch this job along with results to the generated cluster and remove the job from the job set J.
Step 4: If a new job arrives at CN of any cluster C i then partition the job into maximum of 5 equal partitions and then add it to the job set J.
Step 5: Among all the clusters find the worker node with minimum processing time and allocation cost. The processing time is: Step6: CN at cluster C i then dispatches the sub job to the worker node WN min Step7: Repeat step 5-6 until all sub jobs is scheduled. Step8: Repeat step 4-7 until the job set is empty.
Novel Adaptive Decentralized Job Scheduling Algorithm (NADJSA): When the user submits the job, the job is arbitrarily partitioned into sub jobs to the maximum of five partitions. The CN receives the information from the GIC and selects the worker node for scheduling among the entire cluster considering the following parameters: Processing cost, Processing time, Transportation cost, Availability time of the worker node and Deadline of the job.

Novel Adaptive Decentralized Job Scheduling Algorithm (NADJSA) is as follows:
Step 1: If there is any completion time information from CN or WN then update the information at GIC or CN.
Step 2: If J is empty then go to step 12.
Step 3: If a job J i completes the execution of all its sub jobs and was migrated to another cluster then dispatch this job along with results to the generated cluster and remove the job from the job set J.
Step 4: The initial processing time is calculated as: Step 5: If a new job arrives at CN of any cluster C i then partition the job into maximum of 5 equal partitions and then add it to the job set.
Step 6: Among the entire clusters find the worker node with minimum processing time, transfer time and allocation cost. Also check the availability of the worker node for allocating the sub job to the particular worker node. The processing time is: Step 7: The total processing time of a job is calculated as: Step 8: CN at cluster C i then dispatches the sub job to the worker node WN min Step 9: Repeat step 6-8 until all sub jobs is scheduled.
Step 10: If total processing time of the job is less than the deadline of the job hit count = hit count + 1 else miss count = miss count + 1 Step11: Repeat step 5 to 10 until the job set is empty.
Step12: Calculate the processing cost.

RESULTS
We compare the analysis of our proposed Novel Adaptive Decentralized Job Scheduling Algorithm with the existing Decentralized Hybrid Job Scheduling Algorithm based on the simulation parameters of (Suri and Singh, 2010) and are listed in Table 1.

DISCUSSION
The performance of the job scheduling algorithm is based on the three parameters: Total processing time, Total processing cost and Number of jobs. The performance is compared by varying the number of jobs. Table 2 and 3 show the processing time and processing cost obtained by the Decentralized Hybrid Job Scheduling Algorithm and the proposed Novel Adaptive Decentralized Job Scheduling Algorithm.
Graphical representation of Table 2 in Fig. 1 shows that the proposed Novel Adaptive Decentralized Job Scheduling Algorithm provides a better makespan than the Decentralized Hybrid Job Scheduling Algorithm.
Graphical representation of Table 3 in Fig. 2 shows that the proposed Novel Adaptive Decentralized Job Scheduling Algorithm provides a minimum processing cost than the Decentralized Hybrid Job Scheduling Algorithm. The proposed Novel Adaptive Decentralized Job Scheduling Algorithm allocates the job to the worker node based on the deadline of the job. Table 4 and  Table 5 shows the comparison based on the miss and hit count. Miss represents the number of jobs that are completed after the user deadline. Hit represents the successful completion of jobs within the user deadline demand.
Graphical representation of miss and hit count is presented in Fig. 3 and 4. The simulation result shows that the proposed Novel Adaptive Decentralized Job Scheduling Algorithm shows a high hit rate and less miss rate than the Decentralized Hybrid Job Scheduling Algorithm.

CONCLUSION
In this study, we presented an efficient Novel Adaptive Decentralized Job Scheduling Algorithm for a decentralized grid environment. The proposed Novel Adaptive Decentralized Job Scheduling Algorithm aims at minimum cost (Processing time, Processing cost, Transfer cost). The Novel Adaptive Decentralized Job Scheduling Algorithm achieves minimum makespan and minimum processing cost than the Decentralized Hybrid Job Scheduling Algorithm. The result shows that the proposed Novel Adaptive Decentralized Job Scheduling Algorithm reduces the makespan and processing cost, satisfies the user demand, improves the resource utilization and balances the load across the grid environment.