Cost Effective Expa-Max-Min Scientific Workflow Allocation and Load Balancing Strategy in Cloud Computing

: The rise in demand for cloud resources (network, hardware and software) requires cost effective scientific workflow scheduling algorithm to reduce cost and balance load of all jobs evenly for a better system throughput. Getting multiple scientific workflows scheduled with a reduced makespan and cost in a dynamic cloud computing environment is an attractive research area which needs more attention. Scheduling multiple workflows with the standard Max-Min algorithm is a challenge because of the high priority given to task with maximum execution time first. To overcome this challenge, we proposed a new mechanism call Expanded Max-Min (Expa-Max-Min) algorithm to effectively give equal opportunity to both cloudlets with maximum and minimum execution time to be scheduled for a reduce cost and time. Expa-Max-Min algorithm first calculates the completion time of all the cloudlets in the cloudletList to find cloudlets with minimum and maximum execution time, then it sorts and queue the cloudlets in two queues based on their execution times. The algorithm first select a cloudlet from the cloudletList in the maximum execution time queue and assign it to a resource that produces minimum completion time, while executing cloudlets in the minimum execution time queue concurrently. The experimented results demonstrats that our proposed algorithm, Expa-Max-Min algorithm, is able to produce good quality solutions in terms of minimising average cost and makespan and able to balance loads than Max-Min and Min-Min algorithms


Introduction
The evolution of cloud computing has taken place in various phases which include grid computing, utility computing, software as a service and now cloud computing (Sharma and Pariha, 2014). Cloud computing is classified as a type of grid computing, which is aimed at allocating resources properly to address the quality of service and reliability problems facing information dissemination and data storage (Sharma and Pariha, 2014;Hamdaqa and Tahvildari, 2012). NIST defined cloud computing as "a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources like, networks, servers, storage, applications and other services" (Mell and Grance, 2011). Cloud technology has a broader network range that enables cloud resources to be available anywhere in the world by the cloud service providers to consumers on demand. Consumers or users all over the world can access these resources through the use of their digital devices such as laptops, smart phones, tablets, personal computers etc., on pay-as-you-go model or other models through cloud standard mechanisms (Mell and Grance, 2011).
Though cloud computing has gain a lot of successes since its inception, but scientific workflow allocation with the standard Max-Min algorithm in cloud is still a major issue in research because it is unable to select and assign both cloudlets with maximum execution time and minimum execution time concurrently. Scientific workflow allocation is described as a method used to get resources channeled to cloud users at a reduced cost and time. In scheduling, a workflow task has to be selected and submitted to the resource manager and then, the submitted workflow task will be put in a queue if there are other workflow tasks with higher priority than it, until it is time for it to be scheduled (Saraswathi et al., 2015) as displayed in Fig. 1. A workflow task may be required to wait in a queue for a long or short period of time depending on some scheduling factors which include system capacity to execute, task priority, system load and requested resources availability. The focus of this paper is on how to achieve the control of workload, makespan and cost on cloud resources by ensuring that all the workloads on cloud resources are distributed properly to allow free flow of cloudlets on all the network nodes and to guarantee that, all the resources are allocated to cloud users at a lower cost. This will optimise the use of scientific workflows in cloud computing to ensure that multiple resources are made available to cloud users at a reduced cost and time. To achieve this, we proposed a mechanism called Expa-Max-Min algorithm as a way to ensure better load balancing, makespan and cost of executing cloudlets on cloud resources in cloud computing environment. The algorithm takes the advantages of the standard Max-Min algorithm by selecting and assigning cloudlets that requires minimum executing time in the second queue of the resources alongside the original Max-Min steps as well in the first queue of the resource.
We incorporated the proposed algorithm (Expa-Max-Min) into a simulation environment called cloudsim. Cloudsim is a simulation toolkit that allows for the modeling and simulation of all cloud applications and systems for comparison of result in cloud computing environment (Calheiros et al., 2011). The algorithm is implemented to guarantee that all workloads on cloud resources are distributed evenly on all the available network nodes to avoid traffic during cloudlets distribution and also to reduce the high priority in the standard Max-Min algorithm where cloudlets with maximum execution time must finish execution before executing those with minimum execution time. Expa-Max-Min algorithm first calculates the completion time of all the cloudlets in the cloudletList to find cloudlets with minimum and maximum execution time, then it sorts and queue the cloudlets in two queues based on their execution times. The algorithm first selects a cloudlet from the cloudletList in the maximum execution time queue and assigns it to a resource that produces minimum execution time, followed by cloudlet in the minimum execution time queue. The proposed algorithm is able to boost up cloud scheduling processes by simultaneously selecting and assigning cloudlets with both maximum and minimum execution time concurrently at a reduced cost, makespan and balancing loads fairly.

Load Balancing
The increasing demand for cloud applications coupled with the increased in web traffic daily, where millions of cloud users are queued and demanding for cloud services on different servers has made load balancing an interesting topic to be investigated. Load balancing is a method used in cloud and in distributed computing to mediate access to client requests to servers by distributing cloudlets evenly across multiple servers, (Samarsinh and Deshpande, 2014). The purpose of balancing load on cloud is to ensure that workloads on cloud resources are distributed evenly on all the available network nodes to avoid traffic during cloudlets distribution. Also, load balancing in cloud helps cloud users and enterprises to manage their application effectively for higher performance at a lower cost. Figure  2 displays the processes of load balancing in Cloud.

Scientific Workflows
Scientific Workflows allocation and Scheduling is a procedure adapted to manage the execution of interdependent workflow tasks on a distributed pool of resources. This procedure ensures that appropriate resources are identified and distributed on all the available VMs for a successful execution of all tasks to meet the demands of cloud users (Prathibha et al., 2014). The three types of scientific workflows used in this research are: • Laser Interferometer Gravitational Wave Observatory (LIGO) (Silva et al., 2014), also known as Inspiral Workflow is a data-intensive workflow designed to monitor, detect and capture gravitational waves that is produced through various activities in the earth. The main goal of LIGO Inspiral is to identify and give measurement of gravitational waves forecasted by general relativity (Einstein's theory of gravity), in which gravity is described as due to the curvature of the fabric of time and space (Berriman et al., 2004;Mehta and Gideon, 2014) • CyberShake: This is a fraction of Southern California Earthquake Center's (SCEC) (Graves et al., 2011), which uses 3D wave detector to determine source and wave that allows it to compute Probabilistic Seismic Hazard curves for geographic sites in order to simulate, broadcast and to forecast the movement of ground for the production of accurate and reliable environmental estimates rather than relying on the use of empiricalbased ground motion methods (Silva et al., 2014;Cpedia, 2016) • SIPHT: This is a scientific workflow application from the bioinformatics project at Harvard, which is suitable for conducting a broader search for nontranslated RNAs (sRNAs) for the control of several processes like secretion or virulence in bacteria (Mehta and Gideon, 2014;Livny, 2016). Figure 3 displays a summarized Scientific Workflows.

Related Work
The focus of scientific workflow scheduling in cloud computing is to ensure that multiple resources are made available to cloud users at their door steps. This is to ensure that, cloud resources are allocated fairly to cloud users at a reduced cost and time. Scientific workflows have been lengthily used by many researchers as a means of exploring into the parallel distribution of cloud computing applications on distributed systems such as cluster, grids and clouds (Talia, 2013;Gannon et al., 2007;Silva et al., 2014). Scientific workflow allocation and Scheduling is defined in (Prathibha et al., 2014) as a process use to map, manage and execute interdependent workflow cloudlets effectively on a distributed pool of cloud resources to ensure that every available cloud resources is distributed efficiently and fairly on all network notes.
There are many scheduling algorithms that exist in distributed computing, most of which are applicable in the cloud environment, but their performance is questionable in relation to cost, makespan and load balancing, (Thomas et al., 2014). These algorithms include Max-Min, Kaur and Luthra, 2014), Min-Min (Liu et al., 2013), Game Theoretic (Duan et al., 2014;Wei et al., 2010), FCFS (Marphatia et al., 2013), Round Robin (Agarwal and Rastogi, 2014). Though these algorithms have brought some improvement on resource scheduling in cloud computing, but they require more improvement to transform how goods and services are sold and bought in today's competitive environment (Wei et al., 2010).
Max-Min is the most commonly used scheduling algorithm that assigns tasks to resources based on the priority of the tasks. It gives high priority to tasks with high completing time by selecting and assigning them to resources first before considering any other task (Brar and Rao, 2015). Max-Min has been considered as the most proactive algorithm in scheduling because of its ability to assign multiple tasks to a resource at a reduced time and cost. Much work has been done by various researches in a quest to improve the performance of Max-Min scheduling algorithm. For example, Li et al., (2014) proposed for an improved Max-Min algorithm for cloud computing. The idea behind the improved algorithm was to allow task with maximum execution time to be assigned to resources that produces minimum completion time rather than the traditional Max-Min algorithm that select and assigns task with maximum completion time to resource that produces minimum execution time. Though the results of their study suggest a reduction in makespan, however the study was not  (Kaur and Ghumman, 2014), hybrid improved Max-Min Ant algorithms for load balancing and makespan optimisation by selecting and assigning tasks based on their execution time was proposed. Assigning tasks based on execution time reduces the load imbalance, but increases the finishing time (makespan).
Dehkordi and Bardsiri (2015), a compounding algorithm built in the conditions of both Min-Min and Sufferage algorithms called Task-Aware Scheduling Algorithm (TASA) was proposed. The aim of the proposed algorithm was to map tasks with odd number on a machine using Min-Min strategy and tasks with even number on machine using Sufferage strategy. The algorithm was able to achieve higher performance by reducing response time of scheduling task to a resource. However, more emphasis was placed on the number of tasks available to be scheduled instead of placing emphasis on both tasks and resources. Moreover, the parameters for benchmarking was only limited to makespan.
Also, Job scheduling approach in cloud computing based on genetic algorithm on load balancing of virtual machines was proposed in (Hu et al., 2010). The paper took advantage of generic algorithm and suggested for a historical data and current data state to be computed in advance to determine the solution that will have influence on the current Virtual Machine and the resources to be deployed on every physical node to allow for the selection of any deployment mode that will have least influence on the system to be selected. The authors implemented the algorithm on open-source Virtual Machine management platform called OpenNebula (2010) to analyse the load balancing effect and the migration cost on the systems. The advantages of the algorithm were that; it achieves the best result of balancing load fairly and reduces the migration cost of scheduling. Unfortunately, the delay in migrating tasks to VM which may itself waste the processing bandwidth was not address.
Moreover (Bhoi and Ramanuj, 2013;Santhosh and Manjaiah, 2014) improved on Max-Min algorithm. Bhoi and Ramanuj (2013), an Enhanced Max-Min algorithm was proposed. The mandate of the proposed algorithm was to assign tasks with an average execution time to resources that can best give minimum completion time. Santhosh and Manjaiah (2014) proposed an improved task scheduling algorithm based on Max-Min algorithm which selects and assigns tasks that are greater than average execution time to a resource that gives minimum completion time. Though all these algorithms have improved the performance of the standard Max-Min algorithm, nevertheless, the issues of task priority of scientific workflow and scalability were left out, which is the more reason why this research needs to be conducted. Table 1 displays abbreviations used in this research.

Existing Scheduling Algorithms in Cloud Computing
There are many scheduling algorithms in cloud computing environment, but for the purpose of comparison, we will discuss Max-Min and Min-Min algorithms.

Max-Min Algorithm
This algorithm is designed to schedule task in cloud by considering the completion time of every task in the meta-task. In Max-Min algorithm, priority is placed on tasks that have maximum completion time. The algorithm selects and assigns tasks with maximum completion time to resource that produces its minimum execution time. Max-Min algorithm first computes the completion time (CT ij ) = et ij + rt j ; (for each tasks in all VMs) to find tasks with maximum completion time. Then it starts by assigning those tasks with maximum completion time to resources that gives minimum execution time first, (El-Kenawy et al., 2012;Kanani and Maniyar, 2015). Figures 4 and 5 present the Pseudo code and flowchart of Max-Min algorithm.

Standard Max-Min Algorithm
For all submitted tasks in the meta-task T i For all resource Rj Calculate completion time (Ct ij )=et ij +rt j ; (for each task in all VMs) Find a task with maximum completing time Assign the task to a resource that gives minimum completion time End if Update the meta-tasks Update ready time (rt j ) of the selected Rj Update ct ij for all T i End while Fig. 4: Pseudo code of Max-Min algorithm

Min-Min Algorithm
Min-Min scheduling algorithm was built on a concept of selecting and assigning task with minimum completion time to resource that has the capability to execute them within a shorter period of time. This algorithm has two steps to consider when scheduling task. The first step is to compute all the available resource on metatask to find the completion time (CT ij ) = et ij + r tj ; (for each tasks in all VMs) and the second step involve selecting task with minimum expected completion time to a corresponding resource that can execute it faster, (Sharma and Tyagi, 2017;Etminani and Naghibzadeh, 2007;Anousha and Ahmadi, 2013). The pseudo code and the flowchart of Min-Min algorithm are displayed in Fig. 6 and 7.

Standard Min-Min Algorithm
For all submitted tasks in the meta-task T i For all resource R j Calculate completion time (Ct ij )=et ij +rt j ; (for each task in all VMs) Find a task with minimum completing time Assign the task to a resource that gives minimum completion time End if Update the meta-tasks Update ready time (rt j ) of the selected R j Update ct ij for all T i End while The main challenges faced by the standard Max-Min algorithm are the production of high makespan, imbalanced loads and the high priority it gives in favour of cloudlets with maximum completion time in the expense of cloudlets with minimum completion time. To solve these, we proposed a method called Expanded Max-Min (Expa-Max-Min) algorithm. It is an expansion of the standard Max-Min algorithm that reduces the priority given to workflow cloudlets with maximum execution time. Expa-Max-Min allows both workflow cloudlets with maximum and minimum execution time to be selected and assigned to a resource that can execute it within a shorter period of time. Expa-Max-Min algorithm first calculates the completion time of all the cloudlets in the cloudletList to find cloudlets with minimum and maximum execution time, then it sorts and queue the cloudlets in two queues based on their execution times. The first queue will be for cloudlets with maximum execution time and the second queue will be for cloudlets with minimum execution time. First the algorithm will select and assign a cloudlet from maximum cloudlets (MaxClt) queue to a resource that produces its minimum completion time and then followed by the minimum cloudlets (MinClt) queue or else it will continue with the maximum cloudlets. The pseudo code and the flowchart of Exp-Max-Min algorithm are presented in Fig. 8

Results and Analysis
Our proposed algorithm, Expa-Max-Min, is implemented using cloudsim, the simulation is run on Intel® core i3, 500GB HDD and 4GB RAM on 64 bit windows 8 operating system. We considered three workflows such as Inspiral, CyberShake and Sipht. The processing speed of each workflow cloudlet is measured in Million Instructions Per Second (MIPS). Table 2 gives a detailed setting of the simulator parameters whereas Table 3-5 gives scenarios involved in using each of the three (3) workflows selected in this research.  Inspi-median Inspiral with median workflows load 50 3 Inspi-large Inspiral with large workflows load 100 4 Inspi-heavy Inspiral with heavy workflows load 1000 Cyber-median CyberShake with median workflows load 50 3 Cyber-large CyberShake with large workflows load 100 4 Cyber-heavy CyberShake with heavy workflows load 1000 Sipht-median Sipht with median workflows load 60 3 Sipht-large Sipht with large workflows load 100 4 Sipht-heavy Sipht with heavy workflows load 1000

Performance Measures
There are various standard performance measurement used in cloud computing to measure, compare and benchmark the performance of every scheduling algorithm. In this research, we used three performance measures such as:

Load Balancing Metrics
Load balancing is the process by which cloudlets are distributed evenly in the cloud computer environment to avoid traffic and allow free flow of cloud resource on all network nodes. Maipan-uku et al. (2016;Cao et al., 2005), load balancing is calculated as in Equations 1 and 2: The deviation (d) is defined as: The results are analysed based on the four scenarios given in Table 3, 4 and 5, with different algorithms like Expa-Max-Min, Max-Min and Min-Min. Figure 10 presented results on the performance of the evaluated proposed Expa-Max-Min, Max-Min and Min-Min algorithms with regards to load balancing using Inspiral. The proposed Expa-Max-Min algorithm gives remarkable results in terms of load balancing efficiency at all the four scenarios as compared to Max-Min and Min-Min. This is because Expa-Max-Min was able to do away with cloudlets priority by allowing both cloudlets with maximum and minimum execution time to be scheduled concurrently. Figure 11 and 12 also illustrate results of load balancing using CyberShake and Sipht for each of the evaluated algorithms. Based on the implemented results, it has been observed that, at every execution of cloudlet, the load balancing of the proposed algorithm remains more balanced than the standard Max-Min and Min-Min algorithms, this is because the proposed algorithm was able to assign all the cloudlets to the correct resources properly. Moreover, the comparison also shows that, Min-Min algorithm has produced good values of load balancing in Fig. 11 and 12 at all the four scenarios than Max-Min algorithm.

Makespan Metrics
Makespan is the total amount of time that cloudlets spend during execution. Figures 13-15 present results in respect of makespan performances produced by different algorithms using different workflows such as Inspiral, CyberShake and Sipht respectively. Results in Fig. 13 shows that Expa-Max-Min algorithm is outperforming the standard Max-Min and Min-Min by producing lower values of makespan in every execution of cloudlet at all the four scenarios. In Fig. 14 and 15, Expa-Max-Min was able to schedule all the resources on all the available Virtual Machines at a reduced makespan in all the scenarios except at scenario 4 of Sipht (Sipht-heavy) that Min-Min algorithm performed slightly better than the proposed algorithm, this is because Min-Min was able to assign all the cloudlets at that stage properly.

Cost Metrics
Cost is referred to as the amount needed to be spent on scheduling a cloudlet in a cloud computing environment. The cost takes in to consideration of the Job size, cost of network, allocation size of the job, cost of executing a cloudlet, bandwidth, resource scheduling speed and cost of a data center as simplified in Table 6. In (Kamarajapandian and Chitra, 2016), the cost of scheduling in cloud is calculating using Equation 3 to 7: i. Network Transmission Time   to boost up cloud scheduling processes by simultaneously selecting and assigning cloudlets with both maximum and minimum execution time concurrently at a reduced cost. On the other hand, Max-Min, one of the benchmarked algorithms has also performed better than the standard Min-Min algorithm at almost all the three workflows used except at CyberShake that Min-Min performed better than Max-Min algorithm.

Conclusion
In this study, we identified three major challenges in the standard Max-Min algorithm, which includes high cost of scheduling workflows, high makespan and high priority given to workflows with maximum Execution time. In order to tackle these problems, we presented a state of the arc algorithm known as an Expanded Max-Min (Expa-Max-Min) Algorithm.
Our algorithm (Expa-Max-Min) was designed and simulated on cloud simulation platform called cloudsim for the purpose of minimising the makespan value, cost of scheduling scientific workflow and to balance load effectively in a dynamic cloud computing environment.
The proposed algorithm is able to produce good quality solutions for all the cases, schedule both workflow cloudlets with maximum and minimum execution time concurrently without giving priority to workflow cloudlets with maximum execution time as always in the case of the standard Max-Min algorithm. To further prove the effectiveness of the proposed Expa-Max-Min algorithm, we experimented the results on a cloudsim simulator. The simulation results shows that, our proposed algorithm has the ability to balance load fairly, producing better makespan and average executional cost when compared with the standard Max-Min and Min-Min algorithms.