Cluster Based Data Consistency for Cooperative Caching over Partitionable Mobile Adhoc Network

: Problem statement: Data availability and consistency are foremost issues in Mobile Ad hoc Networks due to the absence of permanent infrastructure. Cooperative caching addresses the data availability issue through coordinating the mobile nodes and sharing the cache copies among them. In the meantime, the mobile node must ensure the staleness of cache copies. The consistency maintenance resolves the staleness of the data among the source and caching node. Due to addition of more mobile nodes the network size is increased and it leads to increase the caching nodes. The mobility and disconnections causes additional overhead, latency and reduces the data delivery success ratio while updating the cache copy from the source. Approach: This study proposed Adaptive Push and Pull Algorithm for Clusters (APPC) and Cluster Based Data Consistency (CBDC) approach to address the consistency requirements and maintenance in mobile ad hoc network. Results: The CBDC satisfy the consistency requirements in partitioned clusters. The source node transmits the updated data through the cluster heads to the caching nodes. The APPC ensures the validity of cache copy by threshold Time-to-Live (TTL). Thus it provides efficient valid data accessibility in mobile ad hoc network. Conclusion: The simulation results shown that this proposed approach increases packet success ratio and reduces the delay and overhead when compared with the existing approaches Flexible Cache Consistency (FCPP) maintenance and Cluster Based Cooperative Caching Technique (CBCCT) for increasing number of nodes and speed respectively.


INTRODUCTION
Mobile Ad Hoc Networks (MANETs) consist self governing Mobile Nodes (MNs) with dynamic infrastructure and multi-hop wireless links. The previous researches have primarily focused on routing and MAC protocols in MANET. Although routing and MAC protocols having important issue such as efficient data access in MANET. Moreover, the MANET contains some limitations like battery energy constraint, limited bandwidth, unpredictable signal propagation, mobility and unreliable wireless links. This causes frequent disconnection in the network that makes issues in data availability and accessibility. Cooperative caching is an efficient way to tackle these issues and improve the system performance in terms of energy, query latency, data delivery and overhead. MNs are cooperating with each other to share the data that reduces remote server's workload and communication channel bandwidth. Due to rapid progress in wireless network, MANETs are not only used in military operations and also used in commercial and industrial applications like news, traffic information, cricket score updates and stock market. In cooperative caching, the accessing shared data is widely cached in the caching nodes. The shared cache copy is not a static, it is modified and updated in the source during its lifetime. The modified and/or updated data in the source must be replicated to cache copy in the caching node. Since data have cached in many caching nodes, it requires consistency approaches to ensure that all cache copies are consistent with source. Thus maintaining cache consistency is a challenging issue in the mobile environment. The novel consistency approach is predominantly proposed to handle the consistency among the cache copy in caching node and source.

Motivation:
In fast development of the mobile communication, the MN retrieves the required data from the remote located source node. The MNs frequently change its location during its data transmission due to network dynamism and mobility. MNs cannot retrieve the required data from the remote source at all time in the huge network. Hence, MNs caches the accessed data from remote source to share with its neighbors. This cache copy improves data availability in the network. But, the query latency and overhead have decreased drastically in the huge network due to numerous caching nodes and also invalidation of data takes long time to update the cache copies in caching nodes from the source. The source also must ensure the consistency of cache copies in the caching nodes. It motivates the researcher to make exploration on maintain the consistency among source and caching node over huge MANET.

Problem identification and proposed solution:
The most of the previous research works on cache consistency did not mention any specific approach to handle the consistency among data source and caching nodes with mobility and disconnection in huge MANET. Hence there is a necessity to design an effective consistency approach to maintain the consistency to handle disconnections in huge network.
In this study, we propose to develop an Adaptive Push and Pull Algorithm for Clusters (APPC) and Cluster Based Data Consistency (CBDC) approach to address the consistency maintenance in partitionable MANET. The proposed algorithms can alleviate the consistency issues with mobility and disconnection in huge MANET.
Literature review: There are many researches on the caching and consistency maintenance algorithms for distributed environments such as Web, P2P systems, database and mobile wireless network. However, these approaches cannot be directly applied in MANET due to dynamic topology, limited bandwidth, mobility, energy constraint (Cao et al., 2005). Traditional consistency control approaches are push and pull schemes. The Pushbased schemes are suitable for stable network which guarantees for nodes which are online and reachable from the source at all time. However, these schemes have low query latency and cannot solve the disconnection problem. The caching nodes cannot receive the invalidation messages due to disconnections that results sharing of the stale data upon reconnection. Pull-based schemes are more suitable for dynamic networks which cause high communication overhead in message flooding and caching nodes consume much battery energy. The conventional cache status maintenance approaches are Stateful (SF) and Stateless (SL). In SF (Cao, 2002), data source avoids redundant broadcast flooding in the network. The data source aware about all cache copies in the caching nodes. Hence, it requires a large and complicated database. In SL (Imielinski and Barbara, 1994), data source not aware of cache copies status and simple to handle and implement. However it causes more overhead due to floods more redundant messages.
The most important consistency level approaches are Strong Consistency (SC) and Weak Consistency (WC). In SC, the cache copies in the caching nodes are up-to-date at all time. In WC, consistency of cache copies is maintained among source and caching node, but not provide assurance on the deviation between them. The Delta Consistency (DC) satisfies the maximum acceptable deviation between the source and the cache copy. CBDC proposed to provide the consistency requirements between SC and WC. Each cache copy associates with Time-to-Live (TTL) value. It provides acceptable deviation among the source and cache copies through cluster head. Many consistency algorithms have been proposed for consistency maintenance. Cao et al. (2004) have presented simple weak consistency model in which cache copy associate with Time-To-Refresh value. In this model, the request forwards to the data source if TTR is expired in caching node. It causes long query delay. Duvvuri et al. (2003) presents a new lease approach to provide SC in that the source data is not modified without prior notification as long as the lease is valid. Huang et al. (2010a) have proposed predictive consistency control initiation scheme to provide WC in that source node proactively propagates updates to the caching nodes. But the source node does not know whether the caching nodes require data updates and it also induces round trip cost. Cao et al. (2007) have proposed relay peer-based cache consistency to provide DC based on TTR value. The relay peers selected by data source from stable, high energy nodes and push updates to these relay peers periodically. Other caching nodes receive the updated data from relay peers in a pull scheme. This scheme increases the load on the relay peer and pull request broadcasting consumes bandwidth and energy. Feeney (2001) has presented energy consumption of MANET routing protocols. Xie et al. (2007) have proposed dynamic tree based consistency in which data updated through a binary tree. However, this model makes additional overhead during updating the tree due to node's mobility. Jing et al. (1997) have presented a Bit-Sequence (BS) approach that uses a hierarchical structure of binary bit sequences to represent invalidations for long disconnections. But, data update rate is not high. Li et al. (2007) have presented cache invalidation strategies which reduce latency, but bandwidth cost is high. Huang et al. (2006) have proposed Predictive Caching Consistency algorithm based on the online predictions of data updates and queries. But, these schemes can offer only SC or WC. Li et al. (2009) have offered probabilistic cache consistency model to ensure the validity of cache copy with neighbors cache copy instead of data source forever. But it makes unnecessary invalidation when neighbors copy is stale. Kuppusamy et al. (2012) have proposed Cluster Based Cooperative Caching Technique (CBCCT) based on mobility and connectivity to improve the data availability over MANET. The cluster member caches data in local cache and updates with its corresponding CH's global cache to share with neighbor clusters. But, the consistency maintained by updating source data periodically to caching nodes. It leads to additional overhead. Artail et al. (2008) have proposed cooperation-based database caching system in which MNs cache the submitted queries as indexes. It provides better hit ratios and smaller delays but at the cost of a bandwidth consumption is slightly higher. Huang et al. (2010b) have proposed flexible cache consistency algorithm to minimize consistency cost and ensure the consistency based on probability of data validity. It provides SC, WC and DC. However, this scheme can not satisfy the consistency requirements when source data widely cached in unstable network connection.
However none of previous research algorithms provides to handle the consistency issues with mobility and disconnections in huge MANET. This study aims to provide an algorithm to partition the huge network into clusters, share the data among neighbor clusters through Cluster Head (CH) and maintain the consistency between the data source and caching nodes.
Cluster based data consistency: In this model, the MNs cache the data while accessing the data from the source. This caching MN can directly serve the data to its neighbors queries by Cluster based Data Consistency. Whenever the cache copy is accessed by its neighbor MNs within the cluster or neighbor clusters cache copy access rate is computed. When cache copy access rate is greater than the access rate threshold it considered as frequently required data.
When data is updated in source, it sends the invalidation report to the caching nodes CH. The CH replies acknowledgement to the data source immediately and also intimate to its caching nodes about data being updated. Thus the SC is maintained and requesting neighbor need not be waited for long time. The CH maintains the caching nodes information with cache copy's TTL. When TTL of the cache copy is decreasing lower than threshold, the CH initiating the RENEW message before TTL expires. Thus the WC consistency is maintained by pull approach. Assume that the CH, caching nodes and the data source have synchronized clocks.
Let V t represent the version number of data D i at the source node and C t k represent the cache copy at node k at time t. The initial version number V t is zero at the data creation time. It is incremented for each consecutive update in the source. This updated version of D i , send to all caching nodes through their CH. Hence, CH aware of its member's cache copy information with TTL and it informs to their neighbor CHs. When cache copy D i in the caching node is not stale more than acceptable deviation (δ) time with source, it ensures DC. Maximum acceptable stale time is: The difference in the values among source and cache copy versions is bound by acceptable deviation: Overview of cluster based caching: Clustering pattern: The cluster is configured based on the spatio-temporal stability of MNs (Kuppusamy et al., 2012). The MNs send Hello messages to the neighbors with ID, energy level. The MN with greater energy selected as a CH by neighbors which assigned as a cluster members. Then data are stored in the source MN. MNs send the beacon massage to its CH. The beacon message consists Cluster ID (CID), Data (D) id with TTL, Received Signal Strength (RSS). Then CH replicates its content to other CHs and also collects other CHs contents. If the CH energy level is decreased than cluster member energy, then cluster member is assigned as CH and give up all information to the new CH.

Mobility and disconnections:
The movement sequences of MNs can observe from the RSS at regular intervals. These observations estimate the mobility locations of MNs and strength of the connectivity among connections. Thus mobility and connectivity is computed through RSS. The MN may disconnect from the network due to energy. The CH does not receive reply from the MN during that period. Hence, CH receives the updates from the source and keeps in its cache for a short time. The MN sends the cache check invalidation request to its CH upon reconnection. Then CH transmits the updated data. When MN does not reconnect for long time, CH deletes the data. Thus CH resolves the disconnection issue in clusters.
Data query processing: Whenever MN requires the data D, it checks in Local Cache Table (LCT).
Otherwise, it sends request to its CH 2 as in Fig. 1. CH 2 checks in its LCT for the required data. If data is in LCT, it sends reply to its home cluster caching node. If data is not in LCT, CH 2 checks in the GCT (Global  Cache Table) about neighbor clusters. If data is there, it sends the query to corresponding cluster CH 1 's caching node. Then neighbor CH 1 sends the reply to the MN through its CH 2 . MN caches the data in its LCT, after received from neighbor cluster.
The residual energy E R of CH is computed in Eq. 3 from initial energy (E i ) and total energy consumption (E t ) as: The MN with maximum energy is nominated as a CH. If the E R CH is decreased than any of its cluster member, then cluster member is nominated as new CH.

Cache consistency requirements:
The CBDC approach must satisfy the consistency level, consistency control and data update delay. Consistency control: The most widely used consistency control approach is the cache copy associate with TTL value. The TTL values of cache copies in the nodes to be renewed from the source node while the TTL values have expired. Once the source node updates the data, it needs to update its cache copies in the caching nodes. Hence it initiates push invalidation report to the caching nodes. The caching nodes must reply the acknowledgement to source node due to network dynamism. Both schemes have mutually used in APPC.

Update delay:
The previous schemes mostly focus on how cache consistency should be maintained after the source node has updated the source data. In such schemes, the source can directly update the data without considering consistency maintenance with caching nodes. However, in many cases, the source node can wait for certain time before updating the source data, as in adaptive Lease protocol. The update delay can be utilized to further decrease the consistency maintenance cost. The source node waits for some predefined delay to update the source data in small ad hoc network with stable connection as in flexible cache maintenance. However, it does not provide on how long the source node needs to wait before updating the source data in large size dynamic network. It does not also provide the update delay when MNs in mobility in large network. The APPC provides minimum update delay using by CH.

Adaptive Push and Pull algorithm for Clusters (APPC):
Overview: The proposed APPC satisfies the consistency maintenance cost in cluster based MANET. Each cache copy is associated with TTL which is computed based on acceptable time deviation δ among the source and caching nodes. The caching nodes satisfy their neighbor queries when TTL is not expired. For data update, the source sends an Invalidation Report (INV_REP) to the CHs in the network. The CHs reply Invalidation Report Acknowledgement (INVREP_ACK) to the source. In mean time, CHs intimate about INV_REP to their caching nodes in the cluster. Hence, caching node is not serve the data to neighbors request during that period. The source updates the data after received INVREP_ACK from all CHs. Otherwise, the source waits up to tolerable minimum update delay. For renew the data, CH maintains cache copy information associated with their TTL, query access rate. When TTL decreased out of threshold TTL (TTL th ), the CH renews the TTL of corresponding cache copy in advance based on access rate. CH knows about the access rate of renewal data based on the neighbors' request. Thus caching nodes need not to wait up to TTL is expired to renew and also can serve the data to neighbors continuously. Hence, the delay reduced at TTL renewal. If the cache copy is not accessed by neighbors for long time then its TTL is not renewed by CH. Thus the overhead is reduced by avoiding unnecessary TTL renewal. After received INVREP_ACK from all CHs, then source initiates to update D and pushes to the home CH. Then home CH transmits the updated data into neighbor caching nodes via their respective CHs. If source does not receive INVREP_ACK from any CH, then it waits up to its tolerable minimum update delay. This approach reduces the overhead, energy and bandwidth. When MN moves from one cluster C i to another cluster C j it informs to the new CH j and leave message to CH i . Thus the source push the update with new TTL to home and neighbor CHs instead of the entire caching nodes in the network. Hence it reduces the latency.

Pull:
The individual caching nodes uses pull algorithm to ensure the validity of cache copy. Some of the cache copies have most frequently accessed and remaining less frequently accessed by neighbors. Based on these constraints, the cache copies TTL is renewed from the source by their home CH. The CH maintains the TTL th for cache copies. When cache copy's TTL reduced to less than TTL th , CH sends the request to the source to renew TTL with query access rate. TTL value set by the source based on query access rate and forwards to the CH. Then caching node renews TTL from the CH. If D query_access_rate is the interval between successive query of D and β (value is between 0 and 1 is weighting factor for the recent and past queries), then the TTL value is renewed as: The TTL is renewed based on their query access rate. Hence, TTL is not updated for the less frequently accessed data regularly. Thus it saves the overhead, bandwidth. The caching node is serving the data to neighbors query without interruption. Thus, data access latency is reduced.

Maintaining consistency:
The data consistency must maintain between the source and caching nodes.
Case 1: When TTL of cache copy is decreased less than TTL th , the CH send renew query to the source. At the moment, the TTL to be renewed data is updated between past and current renew requests. The data have also updated multiple times among past and current updates, but these intervals take more than δ time. Hence, source reduces the new TTL value by multiplicative factor m and also sends the updated data to caching node: TTL = TTL x m, 0< m <1 Because, when TTL is minimum then caching node renews frequently. Hence it reduces the staleness by renew TTL. Thus it increases the data consistency and reduces the latency.
Case 2: Sometimes the source data is not updated between past and current TTL renewal. At the moment, source only renews TTL based on linear model: The TTL for least frequently updated data is increased by linear factor. Because, these cache copies does not accessed frequently by neighbors. Thus it reduces overhead, bandwidth and energy consumption.
Case 3: The source updates the data before cache copy's TTL expires. In this approach, source node sends INV_REP to the CHs before push the updated data. Hence, CHs reply the INVREP_ACK to the source instead of all caching nodes in entire network. It reduces the delay to receive INVREP_ACK. To avoid the stale hits, updated data is send to the caching nodes. Thus it increases the consistency among source and cache copy.
Case 4: When the source does not receive INVREP_ACK from all CHs, it waits for t m . But, in proposed approach mostly CHs are not expired. The new CH is selected before current CH expires and all information gives up to new CH. This process takes only minimum time. Hence source just waits for t m only at CH reelection time. The new CH reelected rarely. Therefore source is not waiting to update the data forever. Thus it reduces the delay. The time specifications are summarized in Table 1. In data push propagation, when TTL is expired among data update time t u and minimum update delay t m as in Fig. 2. Here, the source sends updated data with new TTL after received INVREP_ACK from all CHs.  Hence the stale hit is not occurs due to the range (t u , t m ) is tolerable minimum update delay: t r < t u + t m When TTL expires among update delay t m and acceptable deviation t δ as in Fig. 3, stale hits have not occurred. Because, the range (t m , t δ ) is acceptable duration between the source and cache copy: The Fig. 4 shows that TTL expires after the acceptable deviation. At this moment, CH requests the source to renew the TTL once it decreases less than TTL th . Hence, it renews the TTL priorly for its cache copy. Thus a stale hit is not occurs: t r > t u + t m + t δ

MATERIALS AND METHODS
The simulation for the proposed approach was carried out using Network Simulator Version-2 (NS2) with channel capacity is 2 Mbps. The Distributed Coordination Function (DCF) of IEEE 802.11 for wireless LANs used as the MAC layer protocol. It has the functionality to notify the network layer about link disconnection.
In this simulation, MNs make mobility in a 1000×1000 m area for 100 sec simulation time and assume each MN moves independently. The transmission range of MNs is 250 m. The network size is varied as 20, 40, 60, 80, 100 and 120 nodes and the speed of the mobile node is varied as 2, 5, 7, 10, 12 and 15 m sec −1 . Assume that data query and update process is based on Poisson process. The simulated traffic is Constant Bit Rate (CBR).
The simulation settings and parameters are summarized in Table 2.  (Kuppusamy et al., 2012) schemes.

Average query latency:
The average latency is the average latency between user sending the query to the source and receiving the reply from source.

Success ratio:
The ratio of total number of queries sent to the source and total number of packets received successfully.

Control overhead:
The control overhead is defined as the total number of control packets normalized by the total number of received data packets.

Based on nodes:
The first simulation scenario was constructed by varying the number of nodes as 20, 40, 60, 80 100 and 120 with mobile speed as 5 m sec −1 . When the nodes have increased in the network, caching nodes also increased. Due to additional caching nodes in clusters, the query latency and overhead have increased slightly, overall packet delivery success ratio is reduced.
The Fig. 5 shows that the proposed approach, APPC has less query latency than the existing approaches. Because CHs shares cache copy and maintain consistency of cache copy with source. In addition Fig. 6 shows that proposed APPC achieves more success ratio when compared with the existing FCPP and CBCCT. The Fig. 7 shows that APPC outperforms the existing approaches in terms of control overhead. Since the query latency and overhead have increased, the overall success ratio is decreased when the number of caching nodes is increased. The Fig. 8 shows that the proposed APPC protocol has less query latency. Since MNs always have communication with any one of CH. Hence CH accomplishes the neighbor, home cluster MNs data requirements. Figure 9 and 10 shown that proposed APPC achieves more success ratio and less control overhead than the existing FCPP and CBCCT schemes.

DISCUSSION
The simulation shown that the proposed Cluster Based Data Consistency associate with APPC approach reduces latency 2.4%, 2% and overhead 3.5%, 3% and increases packet success ratio 9.2%, 5.5% than FCPP and CBCCT respectively with respect to increasing number of nodes. Also APPC reduces latency 2.2%, 1.7% and overhead 2.3%, 1.5% and increases packet delivery success ratio 9.3%, 3.6% than FCPP and CBCCT respectively with respect to mobility speed. The results proved that the proposed approach APPC provides better performance than the existing approaches FCPP, CBCCT.

CONCLUSION
This study presents a consistency maintenance scheme Cluster Based Data Consistency (CBDC) for cooperative caching over MANET. CBDC improves the data accessibility by reducing the latency, overhead. Also it reduces the energy consumption of MNs by partitioning the huge network into clusters. The CHs shared their information with neighbors to improve the performance. Thus the cooperative caching improves the data availability in MANET. Also Adaptive Push and Pull Algorithms for Cluster proposed to improve the data consistency among the source and caching MNs. The combination of push and pull algorithm for clusters improves the data consistency among source and cache copies by associate with TTL values. These proposed algorithms have reduced the overhead and latency, energy consumption by using the CHs in the clusters.