Application of Graph Theory Concepts in Computer Networks and its Suitability for the Resource Provisioning Issues in Cloud Computing-A Review

: Cloud computing, a kind of web service provisioning model, provides immense benefits over traditional IT service environments with the help of virtualization technology. As cloud computing is not a fully matured paradigm, it poses many open issues to be addressed. The key research problem in cloud computing is efficient resource provisioning which is due to its complex and distributed architecture. Graph-based representations of complex networks provide simpler views and graph theoretical techniques provide simpler solutions for lot of issues inherent in networks. Hence, this paper begins with exploration of graph theory applications in computer networks with specific focus on graph theory applications in cloud computing. This work pays attention to basic resource provisioning problems arise in cloud computing environments and presents some conceptual graph theoretical suggestions to address these issues.


Introduction
The next step in the evolution of distributed computing is cloud computing. It inherits existing distributed computing models such as grid computing, utility computing and adds additional flavour namely virtualization. Large-scale processing and storage of data are very much simplified with the advent of costeffective cloud computing solutions. The Cloud Data Center (CDC) (Wu and Buyya, 2015) is very complex with resources distributed globally leading to several issues. The authors of Smith et al., 2009;Marston et al., 2011;Rimal et al., 2009;Shahzad, 2014) addressed several key and fundamental cloud computing issues like resource provisioning, security, privacy, energy and interoperability, but this list is not limited. From the perspective of cloud service provider and cloud service consumer, these issues provide different pictures. While cloud computing provides opportunities to migrate the IT business services online, these key issues need to be resolved before it is accepted as a successful business model. This article identifies the opportunities of graph theory (Bondy and Murty, 1976) based solutions for the resource provisioning issues inherent in cloud computing. First, it starts with graph theory applications in various areas in computer networks and then explores its suitability to address the resource provisioning issues of cloud. Graph theory is part of discrete mathematics and useful structure to model relationship between objects. Graph theory mainly finds its applications in network modeling, biology, electrical network, computational algorithms and scheduling. Graph theoretical techniques are highly used by computer science applications especially in modeling and routing in networks. Representing a problem as a graph can provide a different point of view and makes a problem much simpler. It provides tools for solving the problem and set of techniques for analysing graphs.
This work has two main parts. The first part gives an overview of graph theory applications in computer networks. As cloud data center has a set of interconnected systems, graph theoretical solutions on computer networks can well be applied on cloud with suitable modifications to address its issues. The second part gives an overview of graph theory applications in cloud. The main idea behind this work is to find out the scope for applicability of graph theory to address resource provisioning issues in cloud.

Security
The authors of (Denz and Taylor, 2013;Ward and Barker, 2014;Mahajan et al., 2011;Subashini and Kavitha, 2011;Zissis and Lekkas, 2012) discussed that as users and corporate information reside on third-party systems, no one can guarantee how secure the data are. It is prone to leakage of information and attack. Security is a primary issue which should be handled by all the cloud service providers to retain their business in the market. They should take steps to protect data and its privacy. Five most representative security and privacy attributes are confidentiality, integrity, availability, accountability and privacy-preservability.

Cost
The authors of (Sahni and Vidyarthi, 2015;Tsai and Liao, 2015;Waibel et al., 2016;Hadji et al., 2016) proposed that the public cloud offers pay per use, which can provide low-cost options for short-term projects. Still, for long-term use, enterprise IT organizations may be better off making a capital investment to purchase additional hardware and software. Enterprises need to conduct a break-even analysis to determine whether a public or private cloud would be more cost-effective for them. From the perspective of provider, they are interested in customer satisfaction and generating revenue out of their services. From the perspective of consumer, they are interested in cost-effective solutions. To balance these two points, cost-effective cloud solutions need to be developed.

Reliability
The authors of (Vishwanath and Nagappan, 2010;Poola et al., 2014;Di and Wang, 2013;Zheng et al., 2012) stated that due to high system complexity and distributed structure, even carefully engineered data centers are subject to a large number of failures. Fault tolerant systems should be built to address reliability concerns. Because of the abstraction nature of cloud environment, there arise a need to develop new or extend traditional fault tolerant approaches. VM migration and Server consolidation are the major threatening factor for fault tolerance as they incur service downtime.

Interoperability
The authors of (Govindrajan and Lakshmanan, 2010; Loutas et al., 2011) stated that interoperability of heterogeneous cloud platforms are difficult because they use distinct hypervisor and VM technologies. The platforms also use various security standards and management interfaces. Multiple vendors with different product standards poses challenges for interoperability. Cloud adoption will be stopped if there is not a good way of integrating data and applications across clouds; hence a unified cloud interface and open standards need to be developed.

Energy
The authors of (Beloglazov et al., 2011;Guo and Fang, 2013;Dayarathna et al., 2016;Beloglazov and Buyya, 2010;Lee and Zomaya, 2010) stated that increasing demand for computational power, leads to setting up large-scale data centers. On the other side, the power consumption of these large-scale data centers is enormous. Hence, design of energy efficient hardware and intelligent resource management techniques is required. Due to enormous power consumption, carbon dioxide (CO 2 ) emission is also more contributing to the greenhouse effect. Hence number of practices need to be applied to achieve energy efficiency, such as improvement of applications' algorithms, energy efficient hardware and energy-efficient resource management strategies on a virtualized data center.
Out of all these above mentioned key issues, this work pays much attention to resource provisioning issue and application of graph theory on it.

Resource Provisioning in Cloud
In cloud computing, resource provisioning (Meng et al., 2010;Buyya et al., 2011;Vecchiola et al., 2012;Javadi et al., 2012;Warneke and Kao, 2011) is the allocation of a cloud data center resources to a user. When cloud data center accepts requests for hardware or software resources, it must create and provision them as virtualized resources. It means the monitoring, dynamic selection/scheduling, deployment/placement and management/load balancing of software and hardware resources for ensuring Service Level Agreement (SLA). The SLA is an agreement between the cloud service provider and the user on guaranteeing Quality of Service (QoS). The provisioning can be done in several different ways. In particular, this work addresses the following aspects of resource provisioning from CDC: • Efficient monitoring for provisioning CDC resources • Optimal VM placement and migration in CDC for energy-efficient resource provisioning • Proper locating of CDCs and allocation of CDCs to the source of requests • Clustering distributed CDCs for faster server provisioning • Uniform assignment of clients to CDC servers • Traffic-aware VM migration to load balance cloud servers

Efficient Monitoring for Provisioning CDC Resources
Task scheduling, load balancing are complicated in cloud computing environment due to its abstract heterogeneous architecture, dynamic behaviour and resource heterogeneity. Monitoring of resources is required before performing scheduling and load balancing.

Optimal VM Placement and Migration in CDC for Energy-Efficient Resource Provisioning
Keeping lot of PMs and VMs running in the datacenter consumes more energy, leading to higher operating costs. Hence identifying physical machines with least load and migrating its load to some other physical machines and then shutting them down saves energy. Conservation of energy may be better achieved through optimal placement of VMs on the PMs and performing VM migrations, so that energy consumption may be maintained at desirable level.

Proper Locating of CDCs and Allocation of CDCs to the Source of Requests
The requests for the CDC services can come from different parts of the world. The term source of requests/clients denote the users who make requests to various cloud data center services. The distance between the cloud data center and the source of requests is a major factor influencing the quality of service in terms of response time and latency. Cloud data center allocation is one of the major issues in cloud computing. An efficient allocation of cloud data center to the source of requests may improve the quality of services.

Clustering Distributed CDCs for Faster Server Provisioning
Normally cloud data centers are distributed across the world to increase the availability of services by remote mirroring, replication which are the kind of redundancy mechanisms. It is distributed mainly for disaster recovery. Clustering region-wise deployed cloud data centers will provide rapid responses.

Uniform assignment of clients to CDC servers
In a distributed cloud data center environment, load balancing techniques direct the requests to the closest source or to the source with the most able capacity to serve the request. Variety of algorithms are used to perform load balancing. But there is a trade-off always exits between choosing the closest cloud data center and balancing the load of cloud data center. Sometimes a cloud data center closer to the user location may be in overloaded condition, during this case, the requests will be routed to a distant cloud data center which is capable of handling the requests. Hence, there arise a need to deal with this trade-off which may consider both proximity and load at the same time.

Traffic-Aware VM Migration to Load Balance Cloud Servers
Upon receiving the load information, cloud broker must invoke load balancing procedure to distribute the load uniform across the hosts in the CDC. It can be done by migrating some of the VMs from overloaded hosts to underloaded hosts considering only server-side constraints. Network-side constraints also need to be considered to enhance the performance of CDCs.

Graph Theory Applications in Computer Networks
This section summary of some of the works, which applied graph theory in various types of computer network. Table 1 lists some of the possible graph theory applications in various types of network. Since cloud is a kind of network, these traditional graph theoretical techniques can be analyzed for their suitability to address resource provisioning issues in cloud and also presents some conceptual suggestions for it.

Graph Theory Applications in Cloud Computing
This section provides summary of works, which applied graph theory in cloud.
The author of (Çatalyürek et al., 2011) proposed a heuristic based on hypergraph and its partitioning for optimizing scientific workflow execution. The author of (Dai et al., 2009) modeled the reliability of the cloud services. This paper models network failure, overflow failure, timeout failure, resource missing failure, hardware failure, software failure and database failure failures using Markov models, Queuing Theory and Graph Theory. The author of (Verbelen et al., 2013) investigated how to optimally deploy software applications on the offered infrastructure in the cloud, by minimizing the network usage. Especially in the context of mobile computing. They designed and evaluated graph partitioning algorithms that allocate software components to machines in the cloud while minimizing the required bandwidth. The author of (Li et al., 2012) proposed CAM, a cloud platform that provides an innovative resource scheduler particularly designed for hosting MapReduce applications in the cloud.  CAM uses a flow-network-based algorithm that is able to optimize MapReduce performance under the specified constraints. The author of (Binz et al., 2012) proposed Enterprise Topology Graphs (ETG) as formal model to describe an enterprise topology. The author of (Bansal et al., 2011) represented the physical network of the cloud as a graph considering the objective of minimizing the congestion. The author of (Chan et al., 2009) formulated a computing cloud as a kind of graph, a computing resource such as services or intellectual property access rights as an attribute of a graph node and the use of the resource as a predicate on an edge of the graph. It also proposes to model cloud computation as a set of paths in a subgraph of the cloud such that every edge contains a predicate that is evaluated to be true. Finally, it presents a set of algorithms to compose cloud computations and model-based testing criteria to test cloud applications. The author of (Peng et al., 2015) constructed a method for VM cluster by energy minimization based on graph theory. Then, they changed deployment of VM cluster into maximum flow minimum cut problem and finally cut formed for VM cluster. The author of (Zegzhda and Nikolsky, 2014) described a formal security model for virtual machine hypervisors in cloud systems based on the graph theory. From Table 2, it is observed that only a very minimal literature is available on graph theory applications in cloud and most of these works overlooked its application on resource provisioning. Thus, it opens lot of opportunities to apply graph theoretical techniques to address resource provisioning issues in cloud.
The following section presents some conceptual suggestions for applying graph theory for resource provisioning issues in cloud computing.

Some Conceptual Suggestions for the Suitability of Graph Theory Techniques for Addressing Resource Provisioning in Cloud
This section considers some of the graph theoretical techniques from Table 1 and suggests the following aspects of resource provisioning in cloud and the suitability of such graph theoretical models and techniques to address its issues.

How to Perform Efficient Monitoring for Provisioning CDC Resources?
Monitoring the complex, distributed cloud architecture for tracking resource status is a challenging task. Monitoring ensures the availability of resources before performing successful provisioning. Generally, a central monitor keeps track of the resources in cloud (Galstad, 2016).
The monitor has to query all the resources in the network periodically to get their availability status, which increases network/message overhead. To resolve this, this work suggests to represent cloud as a graph and construct a minimum dominating set structure, which can be used for monitoring resource status. The minimum dominating set structure can minimize the number of message updates made to the monitor and minimize the update time compared to traditional monitoring methods.
How to Perform Optimal VM Placement and Migration in CDC for Energy-Efficient Resource Provisioning?
As VM placement has direct impact on the performance of the CDC, this work suggests to optimize VM placement and VM migration for energy-efficient resource provisioning. For optimal VM placement, it suggests to construct KD-tree structure of cloud servers and VMs will be placed on best-fit cloud servers quickly. This KD-tree is a useful structure when search involves a multidimensional search key and it is known as associative search. KD-tree is defined as a tree for storing the values or objects in multidimensional space. They scale well in high dimensions. All the resource dimensions namely CPU, RAM, storage and bandwidth of cloud servers and VM can well be represented using KD-tree structure. Moreover, this structure helps in identifying least loaded PMs easily so that VMs running on it can be migrated to other cloud servers and powered off to conserve energy. The author of (Mandal and Khilar, 2013) constructed one dimensional Binary Search Tree, which considers only single resource dimension, CPU.

How to Ensure Proper Locating of CDCs and Allocation of CDCs to the Source of Requests?
This suggests to model cloud data center network as a graph and use facility location problem to locate CDCs in appropriate locations to serve the source of requests. Limiting the distance between cloud data centers and the source of requests leads to faster service provisioning. The author of (Doyle et al., 2013) has proposed the source of requests assignment to the closest cloud data center to reduce the carbon emission but they modeled cloud data center as a complete graph which is unrealistic. They modelled both the networking and computational components of the infrastructure as a graph and proposed a system which utilizes Voronoi partitions to determine how source requests to be routed to appropriate data center based on the relative priorities of the cloud operator for latency purposes. The author of ( Bar-Ilan et al., 1992) provided solutions for facility location problems. They have considered distributing the clients to centers as balanced as possible, but they have overlooked the distance between clients and centers, which is also essential for faster service provisioning.

How to Cluster Distributed CDCs for Faster Server Provisioning?
This suggests to model the cloud environment as a graph and cluster distributed CDCs in order to enable faster service provisioning to the clients. Many works have considered graph-based K-Means (Galluccio et al., 2012) and K-Spanning tree (Zhou et al., 2011) clustering methods to cluster a network. In all these approaches the number of clusters K to be created should be known in advance, but determining K is impractical in case of distributed CDCs. Hence, this work suggests to use dominating set based clustering, where each dominating node acts as a cluster head and neighbors are connected to these cluster heads which makes service provisioning fast. The number of clusters (K) to be created is determined based on size of the graph. So, it leads to a non-constrained clustering algorithm.

How to Ensure Uniform Assignment of Clients to CDC Servers?
This work suggests to use capacitated dominating set concept (Potluri and Singh, 2012) to assign clients to CDC servers. The capacitated dominating set is the problem of finding a dominating set with the additional constraint that the nodes dominated do not exceed the capacity of the dominating node. The capacity can be uniform across all nodes or variable. Hence, a uniform capacity can be assigned to a homogeneous CDC environment and a variable capacity can be assigned to a heterogeneous CDC environment. For client assignment to CDCs, exiting methods have used Round Robin (Radojević and Žagar, 2011) and its variant approaches, which may not guarantee uniformity all the times.
How to Efficiently Perform Traffic-Aware VM Migration to Load Balance Cloud Servers?
Network flow problem (Nakayama and Koide, 2013) in the form of Maximum flow and Minimum cut problem finds its application in various fields ranging from checking network connectivity, network reliability, namely a few.
This work suggests to use network flow problem by converting cloud into a flow network and to perform traffic-aware live VM migration in order to load balance the cloud servers. Most of the existing benchmarked VM migration approaches (Khanna et al., 2006;Wood et al., 2007) consider only host-side resource constraints with little consideration on network-side constraints. A good VM migration algorithm which considers both serverside and network-side constraints can greatly improve their performance. This can be done on both live migration or cold migration scenarios.
In network flow problem, a set of path in a graph G is edge-disjoint if each edge in G appears in at most one path. In cloud, some k number of edge-disjoint paths can be computed between overloaded and underloaded servers and can migrate each VM from overloaded server to underloaded server on one of these k paths to perform fast and traffic-aware VM migrations.

Conclusion and Future Work
In this study, an analysis is done on the application of graph theory concepts in computer networks and its suitability to address resource provisioning issues in cloud. In future, we would like to analyze its scope in other research areas of cloud computing.