Mathematical Model of the Relationship between BGP Convergence Delay and Network Topologies

: BGP is a distant vector inters Autonomous System (AS) routing protocol that comes up after EGP to eliminate the inefficiency of EGP with respect to flexibility and scalability and to give support of an actual routing protocol. BGP handles the scalability problem using Classless Inter-Domain Routing (CIDR) and solves the inefficiency of EGP by accumulating all the possible route information to a destination and running a decision process to select a route to be used and to advertise to the peers. Recently BGP protocol starts to encounter several problems such as routing table growth, load balancing problems, BGP hijacking and transit-AS problems and increasing time of Convergence delay. Convergence delay is the time between the selection process for the best path and when the routers settled. Convergence delay started recently to be an issue for internet and larger network as it started to be increased which causes instability in the network. Instability lead to lost packets, delayed delivery, loss of connectivity and long end-to-end delay in the Internet as well as added overhead to BGP routers. The goal of this research is to study the behavior of different network topology in terms of BGP convergence delay besides defining a mathematical model to represent the relationship between convergence delay and number of nodes. Simulation results show that Mesh topology has the highest convergence delay. The study of the relation between convergence delay and number of nodes leads to mathematical equations which some of them represent linear relationship while others represent compound relationship.


Introduction
Internet is a global network that consists from interconnected computers or network devices allowing the users to exchange and share information across the worldwide.
Internet is divided into large and different regions called Autonomous System (AS) which could be a single or group of networks that are controlled by single technical network administrator, whereas network groups in the same AS usually share similar or common routing policies. AS has the following specifications: (Griffin and Premore, 2001): • AS is assigned a unique number globally called Autonomous System Number (ASN) • AS uses a protocol for communication to each other and exchange information, that protocol is the Exterior Gateway Protocol which is used for communication among ASes and interior Gateway protocol which is used inside the ASes In 1989 Border Gateway protocol became an Internet standard protocol in replace of Exterior Gateway Protocol (EGP) which turned into interdomain routing protocol. Since then, BGP went through series of enhancement and modification and various versions have been released. In 1990 BGP-2 was released by RFC 1163 and one year later BGP-3 updated to BGP-3 by RFC 1267. In 1995 BGP-4, which is the only version used currently, was defined in RFC 1771. BGP-4+ has been defined in RFC 2283 and it was enhanced version for some issues such as IPV6, prefix advertisement, restart capability, improving recovery times, reducing the effect of software and equipment failures on IP routing and make it supported by most network equipment manufacturers. The process on BGP enhancement is still going on by several academic and industry-based cooperate with IETF RFCs to overcome the challenges and problems that BGP is or may face (Quoitin and Uhlig, 2005).
BGP is used as the routing protocol by ASes in the internet; when BGP is used between two different ASes, it is called Exterior Boarder Gateway (E-BGP), while it is called Internal Boarder Gateway (I-BGP) when it is used internally within one AS as shown in Fig. 1. BGP as mentioned previously; is a protocol that is used in network to allow routers to communicate and exchange network reachability information. BGP is based on an asynchronous, distributed, preferred-path vector algorithm and it is described as a path vector protocol. BGP uses TCP as its transport protocol and listens to TCP port number 179. BGP's data units are included within TCP packets and the reliable transport layer protocol is used for acknowledgement, sequencing, fragmentation and retransmission (Dugatkin, 2008). When two TCP systems connect, they exchange messages to open and confirm the connection parameters. Once they agreed, the entire BGP routing table will be exchanged and incremental updates are sent as the routing tables change. BGP maintains routing table for all its peers for the whole connection duration as it doesn't require a periodic update for the entire table (Rekhter and Li, 1995).

BGP Operation
When two routers need to talk with each other they need to establish a BGP session and they will be called BGP peers. In this session the peers will exchange four types of massages (Griffin and Premore, 2001):

Open Massage
It's the first massage to be exchanged for opening a session with the targeted router. Send AS Number (ASN) and IP address of the sender router will be included in the message (Griffin and Premore, 2001).

Update Massage
It contains Network Layer Reachability Information (NLRI), it includes list of IP address of new usable routers and the routers that are no longer available (Griffin and Premore, 2001). Update message always contain fixed-size BGP header and can optionally include other fields (Rekhter and Li, 1995).

Notification Massage
Before peers disconnect and drop the session they exchange final message. This message contains information about prior termination TCP connection conditions and the mechanism to close the connection (Griffin and Premore, 2001).

Keep Alive Massage
This massage is sent periodically to the neighbor to inform them that the connection is still available and the path is not changed (Spirent Communications, 2002). KEEPALIVE message consists of only a message header and has a length of 19 octets (Rekhter and Li, 1995).

Bgp Route Advertisement
When TCP connection is established, BGP speakers will exchange full routing information; each router could receive multiple advertisements from different sources for the same route or destination. Routers need to filter and select only one and the best path for each destination and advertise it to neighbors. In case destination is unreachable through the selected path, routers will adversative the change to all BGP neighbors through update massage (Dugatkin, 2008).
Routes are stored in the Routing Information Bases (RIBs). RIB contains three parts where routes are stored which are the Adj-RIBs-In, the Loc-RIB and the Adj-RIBs-Out. Routes that will be advertised to other BGP speakers must be present in the Adj-RIB-Out; routes that will be used by the local BGP speaker must be present in the Loc-RIB and the next hop for each of these routes must be present in the local BGP speaker's forwarding information base; and routes that are received from other BGP speakers are present in the Adj-RIBs-In (Rekhter and Li, 1995).

BGP Route Path Selection
Path is a sequence of Autonomous system numbers that is recorded through which the reachability information is passed. BGP uses Path Vector (PV) algorithm for selecting the best path to a destination. PV provides information about the properties of the path to reach to a destination. PV does not define criteria about how to select the path. It only standardizes the result of the route election among the routers. BGP routers ignore any routing advertisement that contains their ASN to avoid loops (Dugatkin, 2008).
BGP uses the shorts AS-Path routing technique as a default routing selection. It chooses the lowest number of ASs that the route has traversed through. However, shortest path is not always the best or the fastest path to reach to a destination. Sometimes a single AS hop could be connected to too many router hops and because the BGP doesn't know what is underlying in a network topology it will select the AS because it seems to be the shortest AS number. In addition, BGP is unaware about network performance such as metrics, congestion, packet loss, delay and jitter so it could select a path that is suffer from one of these difficulties. To avoid these two weakness, policies could be configured on the routes to modify the default behavior so the router selection will be based on the best performing path more that the shortest ones (Griffin and Premore, 2001).

BGP Convergence
BGP routers keep updating their routing table when they receive any updates for changes in the network. The time between the selection process for the best path and when the routers settled called convergence. So the convergence appears when the routing table is unsettled for any reason. There are two cases where the convergence could appear in the BGP protocol. One is when the router built its routing table after initialization. The second is when the router updates its routing table which happens when there changes occur in the network and has been advertised (Dugatkin, 2008).
Changes of the network topology could appear in the network due to many reasons such as failure of physical link, reboot of router, adding or deleting network prefix.etc. All these could cause instability in the network where it causes the AS to withdraw the previous announcement, search and select for the best path (it may receive an updates from several peers so it needs to look at the updates and compute the best path for it) and then re-announce it again. All this process called convergence process in BGP (Griffin and Premore, 2001).
Several studies have been done on the BGP convergence behavior and they categorize BGP routing events into four basic types which are (Beichuan et al., 2004).
• Tdown (a previously reachable destinations withdrawn) • Tup (a previously unreachable destinations announced) • Tlong (an existing path is replaced by a longer one) • Tshort (an existing path is replaced by a shorter one) It was observed that Tup and Tshort events typically converge in a relatively short time period, but Tdown and Tlong events can trigger path explorations and take several minutes or more to converge (Beichuan et al., 2004).
Nowadays, BGP Convergence became a problem for the Internet and it could become a larger problem as the Internet continues to grow in size. Ideally BGP would quickly adapt to changes and converge on a new set of stable routes. However, it has been observed that in many cases, BGP routers explore a large number of pos-sible routes before converging on a new stable route. Labovitz et al. (2001) found that the delay in Internet inter-domain path fail-over now averages 3 min and some non-trivial percentage of fail-overs trigger routing table oscillations lasting up to 15 min.
During the convergence period a route could be exchanged which can result in lost packets, delayed delivery, loss of connectivity and long end-to-end delay in the Internet as well as added overhead to BGP routers (Pei et al., 2002).
Network failures especially in the internet are planned for maintenance or unexpected events that appear frequently. BGP could pull through the failure by converging to a new set of valid paths to the destination. However the routing adjustment could take long time because of many factors that could cause convergence delay such as update messages and exploration of alternative paths. Consequently the time for unreachable destination caused by convergence delay could be longer than connectivity losses problem.
There are many factors that affect on BGP convergence and causes convergence delay. The main factor is the Minimum Route Advertisement Interval (MRAI). MRAI is a time value at which the router can send route advertisement to a neighbor. When a node sends an advertisement, it should wait for 30 sec (which is MRAI default value) before sending new advertisement for the same neighbor (Amit et al., 2006).
The timer of MRAI starts once the router send and update to a destination and the router could resend another update to the same end after MRAI time expired. A study about the affect of MRAI on convergence time after fault has been done and it showed that MRAI increased linearly in large scale (Amit et al., 2006).
Another factor that could contribute to increase BGP convergence delay is BGP path exploration behavior. During path exploration, the network may explore a large number of routes before arriving at a stable state. To limit path exploration, identifying the invalid and transient update messages and avoiding their delivery is the key (Wang et al., 2008).
In addition, other factors could cause BGP convergence delay and these are: Size of the network, size of the failure, the average degree of the nodes, degree of distribute, processing overhead, routing policies and network topologies (Amit et al., 2006).

Network Topologies
Network topology refers to the layout, shape or structure of connected devices and cable installation. This shape does not necessarily correspond to the actual physical layout of the devices on the network (Mitchell, 2009).
The choice of topology is dependent upon (Brown, 1996): Type and number of equipment being used, planned applications and rate of data transfers, required response times and Cost.

BGP Convergence Delay Related Work
Many studies have been done recently on BGP convergence delay issues. Researchers have been looking into it in different angle trying all to improve, understand or resolve BGP convergence problem. Labovitz et al. (2001) examines the latency in internet path failure/failover and repair die to convergence properties of inter-domain routing. Pei et al. (2002) shows that BGP can take hundreds of seconds to converge after failure, while the delay can be increased for large-scale failure. They observed MRAI and processing overhead at the router during the convergence and found that MRAI significantly affects the variation in the convergence delay as a function of the size of the failure. They proposed a couple of new schemes to reduce processing overhead at BGP router and tuning of the MARI during large failure which leads to decrement in convergence delay.
Paper regarding improving BGP convergence by reducing the BGP route convergence time and minimizing the member of route change presented in (Pei et al., 2002) and it introduces a new mechanism for improving the convergence properties of path vector routing algorithm. Ricardo et al. (2009) had presented measurement results that identify BGP slow convergence events across the entire global routing table. Their data shows that the severity of path exploration and slow convergence varies depending on where prefixes are originated and based on that they developed a path preference inference method based on the path usage time. Wang et al. (2008) analyzes the upper bound of BGP convergence delay for four basic network topologies which are Clique Topology, Binary Tree Topology, Ring Topology and Focused Topology.
Another paper that spots light on the relation between the BGP convergence and network topology is paper (Craig and Ahba, 2001) where it examines the role of inter-domain topology and routing policy in the process of delayed internet routing convergence.
In (Deshpande and Sikdar, 2004), it characterizes the impact of topology and the message handling procedure of BGP on its convergence time.
Several papers consider the relationship between the network topologies and BGP convergence delay. Yet the number of papers done in the field is little compared to other papers done in other issues in BGP convergence delay. Current studies focus on the BGP convergence behavior to reduce BGP convergence time and have better understanding to the protocol behavior. However, it does not take in consideration "real" topologies and even if they do, more than one factor is studies along with it which let us wonder which factor has more impact than the other. Hassan et al. (2016) proposes Border Gateway Protocol based Path-Vector (BGP-PV) mechanism to choose the path from the vector displayed by the Path-Vector and later split and route the traffic accordingly. The simulation results show that the proposed mechanism can significantly reduce the average endto-end delay by improving the network throughput. Godfrey et al. (2015) study how route selection schemes can avoid changes in routes. Modifying route selection implies a tradeoff between stability, deviation from operators' preferred routes and availability of routes. The paper develops algorithms to lower-bound the feasible points in these tradeoff spaces. Also proposes a new approach, Stable Route Selection (SRS), which uses flexibility in route selection to improve stability without sacrificing availability and with a controlled amount of deviation. Mai and Du (2013) studies to the behavior of a large scale VPN. A simple analysis shows that slow routes convergence and slow route table transfer is the main reason of the disadvantages of VPN technology, it shows that slow routes convergence makes a increasing time to update exchange update routes between routers, meanwhile, Slow route table transfer leads to a lower utilization of provider backbone. Alzate and Reyes (2012) evaluates the behavior of some of proposals that have been made in order to reduce the convergence time, such as ghost-flushing or EPIC, in medium to large Waxman topology networks with nodes ranging 50 and 400 nodes. Wang (2011) analyzes the network convergence problem, elaborates the method of accelerating network convergence of BGP and applies it to an example. Experimental results prove that the improved Minimum Route Advertisement Interval (MRAI) not only guarantee network stability and robustness but also accelerate the routing convergence time.

System Specification and Design
The convergence delay is measured for 4 topologies; Tree, Ring, Grid and Mesh, where Cisco 7600-MSFC2-CLASSIC router is selected .
Cisco 7600-MSFC2-CLASSIC router (Cisco Data Sheet, 2017) is a high-performance router deployed at the network edge; where performance, IP services, redundancy and fault resiliency are critical. It supports a range of IP video and triple-play (voice, video and data) system applications in both the residential and business services markets. It meets requirements for redundancy, high availability and rack density.
BGP Link among routers are assigned whereas routers are configured to be in different AS so each router is represented as an edge router. BGP values set to the default for all topologies so that the only factor which affect convergence delay is the topology itself rather than BGP parameters.
The BGP Parameters are: A Constant Bit Rate (CBR) application is selected to be used as traffic generator; it is a UDP-based clientserver application which sends data from a client to sever at a constant bit rate, whereas 10000 packets will be forwarded by source to destination, each packet is 512 bytes in size.
In presented scenario, four types of topologies are considered as they are the most common topologies used in the internet which are Mesh, Ring, Tree and Grid.

Tree Topology
In tree network, the nodes are connected to each other in such a way that forms a tree like structure. The number of nodes for selected tree is 7 nodes in order to form a balanced binary tree, the number of nodes should satisfy the following formula: where, h is the height of the tree and n is the maximum number of nodes. And for a tree of 3 levels (h = 2) the total number of nodes is 7. Number of levels less than 3 levels is small to measure the convergence delay, while more than 3 levels makes the scenario complex; in order to keep the simplicity of the scenario and have better result, 3 levels of hierarchy are selected.

Mesh Topology
In mesh network, each node is directly connected to all nodes on the network. 7 nodes are configured in this topology so there will be resemblance in the nodes number factor.

Ring Topology
Each node connects to exactly two other nodes, forming a single continuous pathway for signals through each node. 7 nodes are configured in this topology so there will be resemblance in the nodes number factor.

Grid Topology
In a grid topology, each node in the network is connected with two neighbors along one or more dimensions. The number of nodes for selected Grid is 9 nodes because to form a Grid; the number of nodes should satisfy the following formula: where, h is the height of the Grid.
And for a Grid of 3 levels (h = 3) the total number of nodes is 9. Number of levels less than 3 levels is small to measure the convergence delay, while more than 3 levels makes the scenario complex; in order to keep the simplicity of the scenario and have better result, 3 levels of hierarchy are selected.
The simulation results are presented in a statistical graphing that displays hundreds of metrics. QualNet shows the results on both sides, client and server nodes where the client represents the node that sends the traffic and the server terminology will represent the node which receives the traffic. Figure 2 shows that without failure, the Mesh topology has the highest convergence delay followed by Grid topology, Tree then Ring topology. This due to link number factor, which means the more links the higher convergence as the number of the updates and the paths exploration increases. Figure 3 shows that the convergence delay for five cases (four topologies + extra case for Grid topology) is reduced compared to convergence delay without failure. When a fault is added at certain point in the network, the number of links is reduced and so does the number of the update tables that the node receive which means less time taken for routing decision.  Figure 4 shows the relation between convergence delay and nodes for mesh topology with fault in one node. We notice that the more number of nodes the higher convergence delay. While Fig. 5 represents the relation as near linear relationship; in mesh topology, each node is directly connected to all nodes, adding a new node increases the number of links by a fixed value which equals to the previous number of links of the previous added node plus one (Table 1).

Simulation Results
Then the following equation can be used to calculate convergence delay for any number of nodes for Mesh topology: where, X is number of nodes and Y is the convergence delay. Ring Topology Figure 6 and 7 illustrate the convergence delay versus number of nodes for Ring Topology with fault in one node. We noticed that the convergence delay is fluctuated when the range of number of nodes is between 4 and 13, then the convergence delay starts to be steady. The figure shows the convergence delay continues to be steady even when adding new nodes to the topology.   In normal case, there are two paths to every destination node, but in case of failure, there is only one path in certain direction and so, number of hubs changes, which affect the time needed for routing tables to be updated, thus the convergence delay changes. Figure 8 and 9 represent s the convergence delay for Tree topology with fault in one node. The second figure shows that the convergence delay rises sharply when the range of number of nodes is between 3 and 7 where the value of convergence delay reports an optimal value when the number of nodes is 7, after that it continues to be steady whatever the number of nodes is added.

Tree Topology
The following equation summarizes the above:  The relationship between convergence delay and number of nodes in range of 3 and 7 is linear which can be represented after calculation as follows: In tree topology each node has one route to its neighbor nodes. So the node does not require to explore for different routes. After the failure of a node occurs, certain nodes cannot receive any packets from neighbors which means they are totally disconnected from the network. The rest of the nodes' convergence delay decreases little bit because there is no loop in the structure of the tree topology, so only one path is maintained to each destination; (any fault at any AS does not affect the convergence delay of other nodes as the neighbor nodes of the failure one are completely disconnected). In addition the number of nodes decreases after the failure which means less number of paths and updates of routing tables. Figure 10 represents the convergence delay for Grid topology with fault in one node. Figure 11 shows that the relationship between convergence delay and number of nodes is close linear which can be represented by the following equation:

Grid Topology
. . 0.162 0.448... C D N = + Where: C.D. = Convergence delay n = Number of nodes Producing a failure at a node in the Grid could form a ring topology and so the results are similar to those of ring topology.

Conclusion
In this research, the relationship between the convergence delay and 4 types of network topologies are studied, which are considered to be the most common and used topologies. QualNet simulator is used to design scenarios for topologies. The results show that each topology has its own convergence delay behavior as they are structured differently. Another factor that makes the result different is the number of links that connect nodes.
The topology that has the highest convergence delay is Mesh topology followed by Grid, Tree and finally Ring topology. There is a direct relationship between the number of links and the convergence delay where the more number of links the higher the convergence delay is. Consequently, Mesh has the highest convergence delay as its topology requires that each node is connected to the others by direct link.
In addition, the effect of number of nodes factor and links number on convergence delay for all mentioned topologies are investigated. These relations are characterized into mathematical equations that enable us to find the convergence delay for any topology whatever the number of nodes is.
In future, the work will be targeted into three aims; first, applying the previous study on other types of topologies to cover the behavior of all types of topologies in convergence delay. Second, adding to this study the effect of MRAI time on the topologies and convergence delay. Third, using the results to design a hybrid network topology with minimum convergence delay.