Network Traffic Model: A Case of BIUST Network

: Network traffic model and analysis provides the average load, bandwidth requirements and the different application of the available bandwidth for a particular network, in addition to several other details of the network. This paper presents mathematical model used for modelling real world problems using Botswana International University of Science and technology (BIUST) network traffic as a case. Sophisticated analysis of data is done to model the BIUST network with the succor of statistics, as it implies the collection and interpretation of data through mathematical processes called stochastic processes. From the attained results, the model and estimation of packet traffic distribution for BIUST Network based on Pareto distribution, it was perceived that about 20% of the users had about 80% of the bandwidth consumed.


Introduction
Network traffic modelling is used as the basic for the network applications and for capacity planning of network systems. Given the impact of poor choices in this arena, it is clear that the validity of the underlying models is of critical importance (Wilson, 2016). They are a wide number of mathematical models that could be used to model network traffic depending on the type of network to be modelled. The factors used to evaluate a network are taken directly from the underlying traffic model.
Statistics is concerned with making inferences about the way the world is, based upon things we observe happening. Nature is complex, so the things we see hardly ever conform exactly to simple or elegant mathematical idealizations as the world is full of unpredictability, uncertainty, randomness. Probability is defined as the language of uncertainty and so to understand statistics, we must understand uncertainty, as probability and statistics work hand in hand (Tamhane and Dunlop, 2000). Sophisticated analysis of data will be done to model the BIUST network with the aid of statistics, as it involves the collection and interpretation of data through mathematical processes called stochastic processes.
A stochastic process is simply a probability process that is, any process in nature whose evolution we can analyze successfully in terms of probability. On the empirical side, a discussion of the nature of probability would take us too far afield (and might sidetrack us into philosophy) and on the mathematical side the definitions would require too much high-powered mathematics (Tamhane and Dunlop, 2000).
According to (Tamhane and Dunlop, 2000), random variable associates a unique numerical value with each outcome in the sample space. Usually a random variable it is explained as a real valued function defined over a sample space. Random variable is denoted by a capital letter (e.g., Y or X) and a particular value taken by a random variable is denoted by corresponding lower case letters (y or x). Random variable can either be discrete or continuous.
The remainder of this paper is organized as follows. Section 2 gives the related work to this paper. Section 3 gives a brief explanation on mathematical modelling. It will also include the modelling process as well as the classification of mathematical models. In section 4, we provided data sets that will be used to model BIUST network and a packet distribution model for BIUST network will also be presented. Finally, the paper is concluded in section 5.

Related Work
Traffic models reflect our best knowledge of traffic behavior. Latest studies of real telecommunications network traffic data have revealed that teletraffic exposition self-similar (or fractal) properties over a wide range of time scales (Boxma and Cohen, 2000;Radev and Lokshina, 2009). The properties of self-similar telecommunications network traffic are very distinct from properties of traditional models based on Poisson, Markovmodulated Poisson and related processes (Giambene, 2005). Usage of traditional models in networks characterized by self-similar processes can result in to biased conclusions about the performance of analyzed networks (Radev and Lokshina, 2009;Jeong, 2002). Traditional models can result in over-estimation of the network performance (Faraj, 2000), lack of allocation of communication and data processing resources and hence problems in ensuring the QoS. Then, full understanding is that the self-similar nature in teletrafic is a vital issue.
Self-similar teletraffic is seen in LAN and WAN, where superposition of strictly independent alternating ON/OFF traffic models whose ON-or OFF-periods have heavy-tailed distributions with infinite variance can be used to model aggregate network traffic that shows selfsimilar (or long-range dependent) behavior typical for measured Ethernet LAN traffic over a wide range of time scales (Kushner, 2001).
In ATM network traffic self-similar traffic arriving at an ATM buffer results in a heavy-tailed buffer occupancy distribution buffer cell loss probability reduces with the buffer size, not exponentially as in traditional Markovian models, but hyperbolically. One more implementation of traffic self similarity is in Internet traffic, where many characteristics of WWW can be modeled using heavy-tailed distributions, including the distribution of traffic times, the distribution of user requests for documents and the distribution of WWW document sizes (Jeong, 2002).
In TCP/IP network traffic the transfer of files or messages shows that the reliable transmission and flow control mechanisms of TCP serves to mainain long range dependent structure which include heavy-tailed file size distributions (Bobbio et al., 2013). The relationship between self-similar traffic and network performance is defined, as captured by performance measures such as packet loss rate, retransmission rate and queueing delay, where increased self-similarity results in degradation of performance and queueing delay exhibits a dramatic increase as self-similarity increases.
The self-similarity observed in video traffic provides possibility for developing models for Variable Bit Rate (VBR) video traffic using heavy-tailed distributions (Radev, 2005). The autocorrelation of the VBR video sequence decay hyperbolically and can be model educing Fractional Autoregressive Integrated Moving-Average (F-ARIMA) and Fractional Gaussian Noise (FGN) selfsimilar processes (Radev and Lokshina, 2009;Ravid and Lokshina, 2007).
The impact of self-similar models on queuing performance is important and the main trends in such findings are connected with (a) permission traffic modeling for high speed networks, (b) efficient simulation of actual network traffic and (c) analyzing queuing models and protocols under realistic traffic scenarios (Mehdi, 2003). The traditional models of teletraffic that assume independent arrivals, based on Poisson processes, Markov-modulated Poisson processes and other related processes are not able to capture the selfsimilar nature of teletraffic (Hayes and Ganesh, 2004).
In another work presented by (Radev and Lokshina, 2009), the time series of self-similar processes exhibit burstiness over a wide range of timescales. Self-similarity can statistically detail wireless IP network traffic that is bursty on numerous time scales. Modeling and simulation of self-similar telecommunications network traffic can be done with the generators of synthetic self-similar sequences, which are separated into two practical classes: The sequential generators and the fixed-length sequence generators. The fixed-length sequence generators for simulation of self-similar wireless IP network traffic are contemplated in this study (Radev and Lokshina, 2009).

Mathematical Modelling
We propose mathematical modeling of BIUST network traffic using an adopted model called Pareto Distribution model which is used for modelling real world problems. The basis of the distribution is that a high proportion of a population has low income while only a few people have very high incomes (Erlina, 2011).

Overview of Mathematical Modelling
Mathematical modelling is by and large comprehended as the way toward applying mathematics to a real world problem with a perspective of comprehension the last mentioned (Osterbo, 2003;Frost and Melamed, 1994). One can contend that mathematical modelling is the same as applying mathematics where we additionally begin with a real world problem; we apply the essential mathematics, yet after having found the result we no more consider the underlying issue aside from maybe to check if our answer makes sense. This is not the situation with mathematical modelling where the utilization of mathematics is more to understand this real world problem. The modeling process might possibly result to solving the problem altogether however it will reveal insight to the circumstance being studied (Osterbo, 2003). Figure 1 shows the key phases in modelling process. The adopted model that is used to model the network is based in BIUST Network.  (Osterbo, 2003) Categories of Mathematical Models

Continuous-Time Source Models
Continuous time is mostly interested in stochastic process to represent the time-varying source rate X(t) or the set of packet arrival times {t 1 ,t 2 ,t 3 ,t 4 ,…}. This model includes Uniform distribution, Gamma distribution, Exponential distribution, Beta distribution and Pareto distribution (Wilson, 2016;Tamhane and Dunlop, 2000;Frost and Melamed, 1994;Yang and Petropalu, 2001;Chandrasekaran, 1994).

Uniform Distribution Model
A uniform distribution arises in situations where all values are "equally likely" over an interval. Specifically, the Probability Density Function (PDF) of a uniform distribution is constant over an interval. A random variable X has a uniform distribution over the interval [a,b] (denoted by X∼U [a,b]). The PDF is given by: .
The Cumulative Distribution Function (CDF) of a uniform random variable is given by:

Gamma Distribution Model
A random variable is said to be gamma distribution with parameters γ >0 and r if its PDF is given by: where the gamma function, τ(r), is defined by: For positive integer values of r, it can be shown that τ(r) = (r-1)! this special case of gamma distribution, known as the Erlang distribution, is used in queuing theory to model waiting times. A shorthand notation X∼Gamma (γ,r) denotes that X has a gamma distribution with parameters γ and r.

Exponential Distribution Model
The exponential distribution is a continuous analog of the geometric distribution; as such it is an example of continuous waiting time distribution. The PDF of an exponential random variable X with parameter γ >0 is: The CDF of an exponential random variable is given by:

Beta Distribution Model
The beta distribution provides a flexible way to model many types of measurements that have finite ranges. A random variable has a beta distribution on the interval [0,1] with parameters a and b (denoted by X∼Beta (a,b) if its PDF is given by: where, B(a,b) is the beta function defined by: Note that a random variable having a finite range other than [0,1] can always be transformend to the [0,1] range.
The U[0,1] distribution is a special case of the beta distribution when a = b = 1, in which case above PDF reduces to ( )

Pareto Distribution Model
Pareto distribution is a skewed, heavy-tailed distribution that is sometimes used to model the distribution of incomes. The law was developed by Vilfredo Pareto in 1897 and he firstly included it in one of his works in which he attempted to prove that the distribution of incomes and wealth in society is not random that a consistent pattern appears throughout history, in all parts of the world and in all societies (Alzaatreh and Famoye, 2012). Pareto distribution was one of the most famous but much-criticized law of income distribution.
The PDF of a Pareto distribution is given by: The CDF of a Pareto random variable is given by: where, "a" is a shape parameter and "b" represents location parameter of the network.

Bernoulli Distribution Model
Bernoulli random variable is referred to as a random variable that can take only two values, say 0 and 1. The Bernoulli distribution is a useful model for dichotomous outcomes. Some examples are the sex of a baby (male or female), the outcome of an experiment (success or failure) and the toss of a coin (head or tail). An experiment with a dichotomous outcome is called a Bernoulli trial.
Suppose that an item drawn at random from a production process can be either defective or non-defective. Let p denote the fraction of the defective items produced by the process. Then the probabilities of the possible outcomes for an item randomly drawn from this process are P (Defective) = p and P (Non-defective) = 1p. A Bernoulli random variable can be defined as X = 1 if the item is defective and 0 if item is non-defective with the following distribution:

Binomial Distribution Model
In statistics and probability theory, Binomial distribution is a common distribution role of discrete processes in which a stationary probability is dominant for each independently generated value (Chandrasekaran, 1994;Alzaatreh and Famoye, 2012;Ascombe, 1949;Bevrani and Sharififar, 2014;Smith, 2015). Simply a binomial distribution can be explained as the sum of autonomous and identically distributed Bernoulli random variables.
Binomial distribution is now generally used in analysis of data in almost every field of human interrogation since it was elucubrated in connection with games of chance. According to (Yang and Petropalu, 2001) Binomial distribution is normally used to model the number of resource accessible, number of packets that reach the destination without misfortune and the number of bits in error in a packet. It applies to any dominant number (n) of repetitions of an independent process that yields a certain outcome with the identical probability (p) on each repetition. For instance, it provides a formula for the probability of acquiring 10 sixes in 50 rolls of a die. In a proof distributed after the death of Swiss mathematician Jakob Bernoulli in 1713, it was discovered that the probability of k such outcomes in n repetitions is equivalent to the kth term (where k begins with 0) in the expansion of the binomial expression (p + q) n , where q = 1-P thus the name binomial distribution. In the example of throwing a die, the probability of turning up any number on every roll is 1 out of 6. The probability of turning up 10 sixes in 50 moves, then, is equal to the 10th term beginning with the 0th term in the expansion of peas. Fisher discovered exceptional understanding between this number and Mendel's information, which demonstrated 6,022 yellow peas out of 8,023. One would expect the figure to be close, but a figure that close should occur only once in 10 times. In addition, Fisher discovered that all seven results in Mendel's pea experiments were extremely close to the anticipated values-even in an instance where Mendel's calculations contained one defect. The analysis of Fisher sparked a lengthy strife that remains unresolved until today (Ascombe, 1949;Bevrani and Sharififar, 2014).
The general form of the Probability Mass Function (PMF)of a binomial random variable X with parameters n and p (denoted by X~Bin(n,p)) is derived as follows: The probability of obtained x successes and n-x failures in a particular way (e.g., the first x-trials resulting in successes and the last n-x trials resulting in failures) is〖p〗^x〖(1-p)〗^(n-x), because the trial are independent. There are a total of (n|x) ways of distributing x successes and n-x failures among n trials. Therefore PMF is given by:

Poisson Distribution Model
The Poisson distribution is a standout amongst the most vital and generally utilized statistical distributions due to its memory less capabilities. The Poisson distribution is a limited form of the binomial distribution. It is regularly used to portray the pattern of random point-like events in 1-, 2-and 3-dimensions or, all the more commonly, to give the model to uncertainty against which an observed event pattern in time or space might be matched. On the off chance that events happen arbitrarily and autonomously, at a continuous rate (in time) or with a continuous density (in space), then the count of these events per unit time or per unit region will fit in with a Poisson distribution and the example of event is depicted as a Poisson process. The distribution of the length of intervals between events (or waiting times) in a one-Dimensional (1D) Poisson procedure is an Exponential distribution (Smith, 2015;Zou, 2016).
Siméon-Denis Poisson firstly applied Poisson distribution in 1830 to describe the number of times a gambler would win a rarely won game of chance in a large number of tries (Erlina, 2011). Poisson distribution is mostly used to model call arrivals, number of tasks in the system, number of requests to a server, number of failed components and message length (Yang and Petropalu, 2001;Chandrasekaran, 1994). Poisson as it was shown in (13) that the binomial PMF is given by . When n→∞ and p→0 in such a way that np approaches a positive constant γ is the limiting binomial PMF can be shown to be: which is the PMF.

Hypergeometric Distribution Model
The hypergeometric distribution is, fundamentally, an extraordinary type of the Binomial. Hypergeometric distribution is applied when testing is performed from a predictable population without substitution hence making trials dependent on each other. While the Binomial expect that there are n independent trials of an experiment, with a fixed probability, p, which is the same for each event, the hypergeometric deals with the circumstance in which the populace size, N, from which events are tested, is generally small (<100) and inspecting happens without substitution thus the probabilities are not generally the same. A straightforward similarity is selecting balls from a pack or urn containing a blend of red and black. The Binomial applies if, after every random selection the ball is supplanted, whilst the hypergeometric deals with the situation where the balls are not supplanted, so the following ball selection is drawn from a slightly different overall blend of red and dark. Clearly if there are countless in the urn and we are not selecting too much, there is viably no contrast between the two distributions.
Two key assumptions underlying the binomial distribution are that: • the Bernoulli trials are independent • Each Bernoulli trial has the same probability of success These assumptions are valid when a random sample is drawn from an infinite or a very large population 2 of items of which a fraction has a specific attribute. When the population is finite, the assumptions are valid if each randomly sampled item is returned to the population before the next draw. This is called sampling with replacement. But in practice we generally use sampling without replacement. When the population is finite, sampling without replacement creates dependence among the successive Bernoulli trials and the probability of success changes as successive items are drawn. We now derive hypergeometric distribution.
Let N be the size of the population in which M items have a specific attribute. We randomly sample n items from the population without replacement. First we find the number of ways to draw x items from the M with the attribute and (n-x) items from the remaining (N-M) items without the attribute. This is given by . We then divide it by the number of ways to sample items from the population of size N without any restriction, which is . Thus: This is referred to as the hypergeometric distribution with parameter N, M and n. If the sampling fraction n/N is small (≤0.10), then this distribution is well approximated by the binomial distribution (13) with parameters n and p = M/N.

Multinomial Distribution Model
The binomial distribution applies when we have a fixed number of independent Bernoulli trials with constant probabilities p and 1-p for the two outcomes. It is very useful in large number of applications in ecology. In some situations there are more than two possible outcomes in each trial; e.g., a respondent ethnic may be classified as Batawana, Bangwato, Babirwa, Batswapong, Bakgatla, Bakgalagadi, or other. For such trials we need a generalization of the binomial distribution to model the frequencies of different outcomes. Consider a fixed number n of trials where each trial can result in one of k ≥2 outcomes and the probabilities of the outcomes, p 1 , p 2 , …,p k , are the same from trial to trial, with p 1 + p 2 +⋅⋅⋅p k = 1. Let X 1 + X 2 +⋅⋅⋅ X k = n. The joint multivariate distribution of X 1 + X 2 ⋅⋅⋅ X k is called the multinomial distribution and is given by: where, x 1 ≥0 for all i and x 1 + x 2 +⋅⋅⋅ x k = n. This formula can be derived by using the same argument that was used in deriving the formula for the binomial distribution. Specifically, gives the probability that outcome 1 occurs x 1 times, outcome 2 occurs times, etc. in a specified order.

Geometric Distribution Model
The geometric distribution models the number of independent and identical distribution. Bernoulli trials needed to obtain the first success. It is an example of a discrete waiting-time distribution, i.e., the distribution of discrete time to an event. Here the number of required trials is the random variable of interest. As an example, consider playing on a slot machine until hitting the jackpot. Let the probability of hitting the jackpot on any attempt represented as p. The sample space is: where, S denotes a "success" and denotes "failure." Let X be the number of trials required to hit the jackpot. Assuming that the attempts are independent and p remains constant, the probability of hitting the jackpot on the xth attempt is: This is a geometric distribution with parameter p. Its CDF is given by:

Experimental Results and Discussions
BIUST Network Packet Intevals

Live Packet Caption in Morning Time
The data collection was from the live BIUST Network in the morning between 10:00:23 to 10:15:23 on the 06-07-2015. The data is presented in Table 1.

Day Live Packet Caption for a Working Hours
The data collection was from the live BIUST Network for the whole day between for a period of 8 hours on the 10-09-2015. The data is presented in Table  2. The data monitoring and capture was reported in (Solomon et al., 2016).

Sizes and Frequency of Occurrence of the Packets
The first two columns of Table 1 show the values in the log file obtained from Wireshark and the remaining columns are derived from it. The first line shows 0 packets with sizes ranging from 0-19 (bytes). Using the concept of intervals, class limits and the midpoint of a class from statistical theory, the fourth column of Table  1, Average Packet Length (APLi, is obtained. APL for the interval i is given by: In which, x mi and x Mi are the lower and higher values of the i-th interval, respectively. Column "PL" in Table  1. PL shows the packet size (in bytes) and FO is the frequency of occurrence of the packets.
APL Standard value APLs shown in Table 1

Mathematical Model
The mathematical model is based on the analysis of Table 1, data input and data output of the system and being modelled through the use of Matlab software. The mathematical model used is Pareto distribution which its probability density function is given by Equation 10. The parameter a was varied such that 1 ≤ a ≤ 3. This is presented in Fig. 2 and 3. In Fig. 2, it was observed that the PDF curve of BIUST network with a parameter of 1 reaches 0 for x value of 5, with a parameter of 2 it reaches 0 at for x value of 4 and a parameter of 3 at for x value of 3. In this Figures, the a value was adjusted to show variation in the shape parameter and to also note the shape of the PDF. For selected values of the parameter, the simulation was run to compare the empirical density function to the PDF.
The cumulative distribution function curve of BIUST network is then presented in Fig. 3. In this Figure, D(x) scale axis was adjusted to show that the differences between the curves depend on the value of parameter a. This confirms the results of Vilfredo Pareto (Alzaatreh and Famoye, 2012) as the Figures show skewed distribution with heavy or slowly decaying tails as much of the data is in the tails.
As Pareto distribution uses the 80-20 rule corresponds to a particular value of parameter (α), it was observed that about 20% of the users had about 80% of the bandwidth consumed. The PDF graphs shows that the probability or fraction is rather high at a small value of PL and then decreases steadily as value of PL increases. Rather the CDF curve is high at higher values of PL then decreases steadily as value of PL decreases. Figure 4 shows the relationship between Packet Length (PL) and Frequency of Occurrence (FO). The result shows that, the Frequency of Occurrence (FO) is much higher at the packet length of 3 which corresponds to (40-79 bytes).

Conclusion
This paper presents a Model used in traffic engineering to predict network performance and to evaluate congestion control schemes. From the result, it shows that, traffic models must have a manageable number of parameters and the estimation of these parameters must be simple. BIUST network was used as a case for the packet distribution model and Pareto distribution approach was adopted and it was observed that the PDF curve changes when there is a change in parameter. As the research was conducted within the boundaries of BIUST network and the attained results might be difficult to generalize to other campus networks due to the difference in network setups and policies implemented. As such, future studies may focus on applying the research on different campus networks so that the results can be generated across different network setups.