An Optimal Path Management Strategy in Mobile Ad Hoc Network Using Fuzzy and Rough Set Theory

: Problem statement: Mobile Ad Hoc Network (MANET) is a collection of wireless mobile nodes that dynamically forms a network. Most of the existing ad-hoc routing algorithms select the shortest path using various resources. However the selected path may not consider all the network parameters and this would result in link instability in the network. The problems with existing methods are frequent route change with respect to change in topology, congestion as result of traffic and battery limitations since it’s an infrastructure less network. Approach: To overcome these problems an optimal path management approach called path vector calculation based on fuzzy and rough set theory were addressed. The ultimate intend of this study is to select the qualified path based on power consumption in the node, number of internodes and traffic load in the network. Simple rules were generated using fuzzy and rough set techniques for calculating path vector and to remove irrelevant attributes (resources) for evaluating the best routing. The set of rules were evaluated with proactive and reactive protocols namely DSDV, AODV and DSR in the NS-2 simulation environment based on metrics such as total energy consumed, throughput, packet delivery ratio and average end-to-end delay. Results: The results have shown that in MANET, decision rules with fuzzy and rough set technique has provided qualified path based best routing. Conclusion: The network life time and performance of reactive and proactive protocols in MANET has improved with fuzzy and rough set based decision rules.


INTRODUCTION
MANET is a collection of mobile nodes without any fixed infrastructure. They can be set up quickly where the existing infrastructure does not meet application requirements for reasons such as security cost or quality. MANET consists of nodes which can move freely and can communicate with other nodes by means of a direct link or by relaying through intermediate nodes. The performance of the network suffers as the number of nodes grows and a large network quickly becomes difficult to manage. There are various routing protocols designed specifically for MANET such as Ad Hoc on-Demand Distance vector (AODV), Dynamic Source Routing (DSR), Destination Sequence Distance Vector (DSDV) and Wireless Routing Protocol (WRP).
One of the key challenges in MANET is routing. Researchers have been investigating to find the shortest path from source to destination by applying varying methods. There exist numerous routing paths from source to destination node (Perkins and Bhagwat, 1994;Perkins and Royer, 1999) for data transfer. At present, the fields like fuzzy and rough set theory are having an efficient role in handling wireless network.
Fuzzy set theory is based on the degree of membership function (Zadeh, 1965). The membership function allows its value in the interval [0, 1]. Rough set theory proposed by Pawlak (1982) is an extension of classical set theory for dealing with vagueness in the real world. Its concepts and operations are defined based on the indiscernibility relation. It has been successfully applied in selecting attributes to improve the effectiveness in deriving decision rules (Jensen and Shen, 2007). Also, this approach will lead researchers to focus on benefits of non-algorithmic models to overcome the estimation problems (Attarzadeh and Ow, 2010).
Integrating the advantages of fuzzy and rough set theory, this study proposes a hybrid system to select an effective routing path in MANET. In the first stage, the data set consisting of resources and paths are fuzzified.
In the second stage, information gain is calculated by using ID3 algorithm for evaluating the importance among attributes. In the third stage, a decision table can be reduced by removing redundant attributes (resources) without any information loss. In the fourth stage, IF (condition) -THEN (outcome) decision rules can be extracted from the equivalence class to select the best routing path. Finally, set of rules were evaluated with proactive and reactive protocols namely DSDV, AODV and DSR in the NS-2 simulation environment. An example is also presented to show the applicability of the proposed method.

MATERIALS AND METHODS
The motivation for an analytical solution of path selection is based on various research efforts. A number of routing protocols such as AODV, DSR, DSDV and WRP have been proposed for Ad Hoc networks.
AODV is loop-free, self-starting and scales to large number of mobile nodes. It is a reactive protocol in which routes are created only when they are needed. It uses traditional routing tables, one entry per destination and sequence numbers. It determines up to date routing information and prevents routing loops.The modifications to AODV are more useful to moderately loaded high mobility networks (Rani and Dave, 2007).
DSR protocol is based on source routing where all the routing information is maintained (continually updated) at mobile nodes. However, it uses source routing instead of relying on the routing table at each intermediate device.
The main contribution of DSDV protocol is to solve the routing loop problem. Each entry in the routing table contains a sequence number, the sequence numbers are generally even if a link is present; else, an odd number is used. The number is generated by the destination and the emitter needs to send out the next update with this number. Routing information is distributed between nodes by sending full dumps infrequently and smaller incremental updates more frequently.
WRP uses an enhanced version of the distancevector routing protocol, which uses the Bellman-Ford algorithm to calculate the paths. Because of the mobile nature of the nodes within the MANET, the protocol introduces mechanisms which reduces the routing loops and ensure reliable message exchange.
In FCMR (Fuzzy Cost Based Multipath Routing) protocol, the traffic is distributed amongst the best selected paths from the existing multipath routing. The selection is based on consideration of six resource constraints such as bandwidth, computing efficiency, power consumption, traffic load, the number of hops and total vector cost (Raju and Ramchandram, 2008).
An alternative approach based on fuzzy and rough set methodology is described in this work for the selection of best routing path with minimum number of resources.
Fuzzy set theory: Fuzzy set theory was first proposed by Zadeh (1965). The main objective of this theory is to develop a methodology for the formulation and solution of problems that are too complex or ill-defined to be suitable for analysis by conventional Boolean techniques. A fuzzy set can be defined as a set of ordered pair A = {x, µ A (x)/x∈U}. The function µ A (x) is called the membership function for A, mapping each element of the universe U to a membership degree in the range [0, 1]. An element x∈U is said to be in a fuzzy set if and only if µ A (x) > 0 and to be a full member if and only if µ A (x) = 1. Membership functions can either be chosen by the user arbitrarily, based on the user experience or they can be designed by using optimization procedures. The triangular membership function is defined as: Rough set theory: Rough set theory is an extension of conventional set theory that supports approximations in decision making (Pawlak, 1982;Duntsch and Gediga, 1999;Skowron et al., 2002;Pal and Skowron, 2003). A rough set is itself the approximation of a vague concept (set) by a pair of precise concepts, called lower and upper approximations, which are a classification of the domain of interest into disjoint categories. The lower approximation is a description of the domain objects which are known with certainty belong to the subset of interest, whereas the upper approximation is a description of the objects which possibly belong to the subset. It provides useful information about the role of particular attributes and their subsets and prepares the ground for representation of knowledge hidden in the data by means of IF-THEN decision rules.
Information system: An information system can be viewed as a table of data, consisting of objects (rows in the table) and attributes (columns). An information system may be extended by the inclusion of decision attributes. Such a system is termed as decision system. Suppose we are given two finite and non empty sets U and A, where U is the universe and A, a set of attributes. With attribute a∈A, we associate a set (value set) called the domain of a. Any subset B of A determines a binary relation IND (B) on U which will be called an indiscernibility relation Eq. 1: where, IND (B) is an equivalence relation and is called B-indiscernibility relation.
Lower and upper approximation: Let us consider B⊆A and X⊆U. We can approximate X by using only the information contained in B by constructing lower approximation (2) and upper approximation (3) of x in the following way Eq. 2 and 3: And: Equivalence classes contained within X belongs to the lower approximation whereas equivalence classes within X and along its border form the upper approximation. Let P and Q be set of attributes including equivalence relation over U, then the positive region is defined as Eq. 4: where, POS P (Q) compromises all objects of U that can be classified to classes U Q using the information contained within attributes P.
ID3 Entropy: Attribute selection in ID3 (Wang and Lee, 2006) and C4.5 (Quinlan, 1992) algorithms are based on minimizing an information entropy measure applied to the examples at a node. Entropy has widely applied to many fields. The entropy measure is used to select the attributes providing the highest information gain.
Quinlan's ID3 decision tree algorithm grasps the entropy concept for attribute selection. A data set with some discrete valued condition attributes and one discrete valued decision attribute can be presented in the form of knowledge representation system J = (U,C∪ D),where, U={u 1 ,u 2 ,…..,u s } is the set of data samples, C={ c 1 ,c 2 ,…..,c n } is the set of condition attributes and D={d} is the one-elemental set with the decision attribute or class label attribute . Suppose this class label attribute has m distinct values defining m distinct classes, d i (for i = l, 2... m) and let s i be the number of samples of U in class d i . The entropy for a subset is given by Eq. 5: where, P i is the probability that an object is in i th class log 2 is log base 2. Gain (S, A), an information gain of example set S on attribute A is defined as Eq. 6: where, Σ is each value v of all possible values of attribute A, S v is subset of S for which attribute A has value v, |S v | denotes the number of elements in S v and |S| denotes the number of elements in S.

Illustrative Example:
A data set of resources allotted to five paths is given in Table 1 to select efficient path.
Fuzzifying the dataset: From Table 1, we consider bandwidth, computer efficiency, power consumption, traffic load and number of internodes as five condition attributes and total vector cost as a decision attribute to represent minimum cost for the selection of best path. Initially, in order to represent a continuous fuzzy set, we need to express it as a function which maps each real number to a membership degree. A very common parametric function is the triangular membership function which can be derived through automatic adjustments. Each attribute have three fuzzy regions (low, medium and high) described as follows: Band width: Low (0, 0.2, 0.4) Medium (0.3, 0.5, 0.7) High (0.6, 0.8, 1.0)  Computer efficiency: Low ( Table 2. Information gain: ID3 uses an information theoretic approach aimed at minimizing the expected number of tests to classify an object. Using (5) and (6), the information gain for each attribute is calculated. We get Gain (Bandwidth) = 0.24, Gain (Computer efficiency) = 0.42, Gain (Power consumption) = 0.44, Gain (Traffic load) = 0.94 and Gain (Number of internodes) = 0.54.Since, Power consumption, traffic load and number of internodes has the highest information gain among the five attributes, bandwidth and computer efficiency may be excluded due to their less importance. The data set is shown in Table 3.
The decision attribute (Total vector cost) has two values, Good and Poor. Each value may be classified into its partition. From Table 3, it is clear that X G = {2, 3, 5} and X P = {1, 4}. It means path 2, 3 and 5 belong to partition X G and path 1, 4 belong to partition X P . For each partition, identifying the C-lower approximation of X Y and X N, we have CX G = {0} and CX P = {0}. Hence, building the positive region by combining the C-lower approximations of the two partitions: POS C (D) = {1, 2, 3, 4, 5}. From POS C (D), C-equivalence classes in the positive region are constructed and are shown in Table 4.
The calculated result is shown in Table 5. Reduct i of an equivalence class should be able to distinguish Equiv i from all other equivalence classes. Reduct i should be the joint of the entries in the i th row of the discerning matrix. Using Boolean operation, we get: Finally, the decision table can be built to extract the rules.
From Table 6, we can extract decision rules in IF-THEN form. Here the condition attribute values (Traffic load = high, No. of internodes = low) are used as the rule antecedent and class label attribute (Total vector cost = Poor) as the rule consequent. Hence, we can extract the following decision rules:    Hence path 2, 3 and 5 are considered as the best path.

Simulation environment:
The simulations were carried out in the Network Simulator NS-2 with the area of 1000×1000 m for 5, 10, 30 and 50 mobile nodes. The simulation time is 200 sec and each simulation is performed under varying pause time, number nodes and packet size. The pause time indicates the amount of time that a node will pause in between two transitions. The pause times considered for this particular simulation are 10, 50, 100 and 150 sec and 10 movement patterns for each value of pause time. A pause time of 10 sec would denote a rapidly changing network topology and a pause time of 150 sec would denote a relatively stable network. The numbers of traffic sources considered are 1, 3, 5 and 7. The speeds of the nodes are randomly assigned during the creation of the mobility pattern. The speed varies between 0 and 20 m s −1 . The traffic is sent with different the packet size of 256, 512, 1024 and 2048 bytes and the packet interval time is 10 ms. The bandwidth of the wireless links is 11Mbps, similar to those of an 802.11b based network. Under the above conditions we have studied the path management using three ad-hoc routing protocols namely AODV (Perkins and Royer, 1999), DSDV (Perkins and Bhagwat, 1994) and DSR (Baiamonte and Chiasserini, 2004).
The metrics used for comparison are: Initial energy (Battery) 150 Joules Transmission power 0.9 W Reception power 0.8 W Idle power 0.2 W Sense power 0.0175 W Total energy consumed: Total energy consumption for each of the simulation and divided them by the total number of successfully received bytes.
Throughput: Throughput is the total number of Kilo bits (Kb) of data successfully received by the receiver per unit time (second).

Packet Delivery Ratio (PDF):
The packet delivery Ratio is the ratio of total number of successfully received packets to the total number of sent packets.
Average end-to-end delay of data packets: This is the average delay between the sending of the data packet by the constant bit rate source and its receipt at the corresponding constant bit rate receiver.

RESULTS AND DISCUSSION
Total energy consumed: The evaluation of energy consumption is particularly important in case of mobile ad-hoc environment as it is an infrastructure less network. For evaluating the energy consumption of the routing protocol, we use the energy model that is built into the NS2 network simulator. This energy model (Baiamonte and Chiasserini, 2004) is built around the IEEE 802.11 MAC protocol. In general a network interface is always in one of the four possible states: Transmit, receive, idle and sleep. The power requirement for transmit and receive mode remain high but for idle/sleep mode it is low. The parameters used for energy model in the simulations are.
In our simulation energy is measured in two diverse means first the total energy consumption is calculated for number of intermediate node and second for multiple connection/data flows/traffic. From the Fig. 1a, it is apparent that the total energy consumption of a node increases as the traffic in the network increases. The energy cost increase with the increase in nodes that is more predominant in the case of on-demand routing protocols than table driven protocols. This could be associated with the increase in the number of routing packets required to maintain routes to more destination nodes in the case of ondemand routing protocols. However, proactive routing protocols by default maintain routes to all possible destinations within the network irrespective of whether there is any data to be sent to that destination or not.
For the multi source-single destination scenario the total energy consumption of a node increases as the traffic in the network increases, DSR (Fig. 1b) is observed to consume the maximum energy. This is due to the unnecessary loss in valuable energy, resulting from transmission of packets along stale routes. Throughput: In our simulation throughput is calculated in three diverse means first by calculating received packets with respect to different pause time for single connection/data flow/traffic, second by calculating received packets from multiple connection/data flows/traffic and taking the average of all these connections to obtain the throughput and third by increasing the intermediate nodes. Figure 2a shows the effect of throughput from single connection/data flow/traffic for different pause times, here the packet received (throughput) AODV, DSDV and DSR remain high for different pause times. But in the case of multiple traffic connections/flows (Fig. 2b), it is observed that the AODV, DSDV and DSR throughput rapidly reduces as the number of flow increases. This due to the fact as the number of traffic connections/flows introduces more congestion, packet drops and processing delay in the intermediate nodes.
In contrast the lesser number of traffic connections/flows throughput remains high. It is observed from the Fig. 2c that throughput increases as the number of intermediate nodes between source and destination increases. The thick concentration of nodes gives the advantage of solid connectivity between pair of nodes this in turn reduces the probability of packet drop both in proactive and reactive protocols. The packet delivery Ratio for the three protocols AODV, DSDV and DSR were analyzed with increase in intermediate nodes and different connection/flow/traffic. The packet delivery ratio (Fig.  3a) increases as the number of intermediate node increases in AODV, DSDV and DSR for single connection/flow/traffic, Less number nodes creates instability of link i.e.,) packets are dropped due to non availability of routes and it leads to the formation of holes/gap in network. In contrast as the number of nodes increases the probability of packet drop will be less and it also avoids holes/gap formation in the network.
From the Fig. 3b it is observed that the packet delivery ratio for AODV, DSDV and DSR decrease as the number of traffic connections/flows and increase in load. Initially on low traffic load the AODV, DSDV and DSR performs better, as the load increases PDR decreases. Similarly increases in number of traffic connections/flows in AODV, DSDV and DSR leads to congestion in the intermediate nodes and it can't able to appropriately deliver the packet to the destination due to frequent packet drops in the forwarding nodes.
It is evident from Fig. 4a-b for AODV, DSDV and DSR that End-to-End Delay increases with (i) Increase in number of intermediate nodes: Higher number of node the increases the hop duration of the packet travelling from source to destination. (ii) Multi source traffic/connections/flows that cause congestion in the network and this led to packet delay. In case of congestion more and more packets are queued in the router buffer that is located along the path to the packets' destination. In the worst case, the buffer will overflow causing the router to discard packets. The propagation delay will continually increase until the congestion is cleared.

CONCLUSION
From the graph and analysis of fuzzy and rough set based path vector calculation three conclusions were made for stable path management with effective usage of available resources so as to maintain the stable link and to increase the network life time.
Network with significant number of intermediate node decreases the possibility of link failure since it's been inter connected solidly, the packet delivery fraction also gets increased as the packet drop in the network is reduced with least like hood of holes formation/gap between nodes and delay will be reduced as the least time required for route establishment.
For a stable link, the routing path is to be established with less energy consumed intermediate nodes but not on the basis of shortest path. Node with heavy consumption results in link failure since its infrastructure less mode of propagation and it leads to packet drop, delay, decrease in throughput and formation of holes in the network.
Number of traffic/connections/flows cause congestion in the network and it would result in delay; there is also a gradual decrease in throughput, increase in the total energy consumption, packet drop and delay, with increase number of flow.
From the conclusions it is apparent that to maintain a good routing path, path number 2, 3 and 5 from the rule Table 4 are considered to be the best qualified path that will guarantee the link stability and increase the network performance.