Preserving Data Consistency through Neighbor Replication on Grid Daemon

: In modern distributed systems, replication receives particular attention for providing high data availability, fault tolerance and enhance the performance of the system. It is an important mechanism because it enables organizations to provide users with access to current data where and when they need it. However, this way of data organization introduces low data consistency and data coherency as more than one replicated copies need to be updated. Expensive synchronization mechanisms are needed to maintain the consistency and integrity of data among replicas when changes are made by the transactions. In this paper, we present Neighbor Replication on Grid (NRG) daemon in order to manage replication and transactions in distributed system. NRG Transaction Model has been implemented in order to preserve the data consistency and availability. Based on experiment and result, it shows that NRG daemon guarantees consistency and obey serializability through the synchronization approach.


INTRODUCTION
In modern distributed systems, replication receives particular attention for providing high data availability, fault tolerance and enhance the performance of the system [1,2,3] . It is an important mechanism because it enables organizations to provide users with access to current data where and when they need it. The failure of system can be transparent from users and applications if they can obtain data from an identical replica. Replication can improve performance by scaling the number of replicas with demand and by offering nearby copies to services distributed over the network. An ideal distributed file system provides applications strict consistency, i.e., a guarantee that all I/O operations yield identical results at all nodes at all times [4,5] . In a replication system, the value of each logical item is stored in one or more physical data items, referred to as its copies [5] . Each read or write operation on a logical data item must be mapped to corresponding operations on physical copies. Of course this way of data organization introduces low data consistency and data coherency as more than one replicated copies need to be updated. Expensive synchronization mechanisms are needed to maintain the consistency and integrity of data among replicas when changes are made by the transactions. This suggests that proper strategies are needed in managing replication and transactions in distributed systems. There are many examples of replication schemes in distributed file and database systems. Among them are based on synchronous replication [6,7,8] , which deploy quorum to execute the operations with high degree of consistency and ensure serializability. Synchronous replication can be categorized into several schemes, i.e., all -data-to-all-sites (full replication) and some-dataitems-to-all-sites. However, full replication causes high update propagation, high storage capacity and difficult to maintain the data consistency [1,9,10] . A few studies have been done on partial replication techniques based on some data items to all sites using tree structure technique [11,12] . This technique will cause high update propagation overhead. Thus, some-data-items-to-allsites scheme is not realistic. Furthermore, in many applications, there is update-intensive data, which should be replicated to very few sites. The European DataGrid Project [13] implemented this model to manage the file-based replica. It is based on the sites that have previously been registered for replication. This will cause the inconsistence number of replication occurs in the model. Also, the data availability has very high overhead as all registered replicas must be updated simultaneously.
In this paper, we present Neighbor Replication on Grid (NRG) daemon to manage replication and transactions in order to preserve data consistency and maintain data availability in distributed system. NRG daemon guarantees data consistency and obey serializability through the synchronize replication. The mechanisms for locating and managing replicas, as well as performance details can be found in our previous work [2,8] .

NRG Transaction Model:
In this section, we recall the NRG Transaction Model. The following notations are defined: a) T is a transaction. b) α and β are groups for the transaction T. c) α = γ or β where it represents different group for the transaction T (before and until get quorum). d) T α is a set of transactions that comes before T β , while T β is a set of transactions that comes after T α . e) D is the union of all data objects managed by all transactions T of NRG and x represents one data object (or data file) in D to be modified by an element of T α and T β . f) Target set = {-1, 0, 1} is the result of transaction T (see Table 1). g) NRG transaction elements represents the transaction feedback from a neighbor site. (i.e., the transaction locked a data x at neighbor).
This means accessing failure. By , we mean that the destination server could not perform the job. Data file x managed by the primary site is already locked. The transaction has not executed. For neighbor site, This means unknown status. By we mean that the neighbor site cannot tell if the NRG transaction has or has not been executed yet. This could happen when the destination host is down, or the link between primary and neighbor site is down, or both of the situations. In that case, the NRG request transaction or the message may be lost. So we do not know if the transaction has been executed or not at neighbor site. This will be tracked by unknown status counter.
Four phases involve in NRG transaction semantic, which are initiate lock; propagate lock and obtain a quorum; release lock, update and commit data; and handling failure (unknown status). Fig. 1 shows the framework of semantics of NRG Transaction Model. NRG Daemon: NRG daemon has been developed in order to give a better intuition on how to manage replication and transactions through NRG Transaction Model. A daemon is defined as a computer program that runs in the background and ready to perform without user input [14] . Usually, it provides some services either for the system as a whole or for the user applications. NRG daemon is started (and stopped) when a system changes the run levels. It is ordinarily starts when a system boots and runs until system shutdown, unless it forcibly terminated. In particular, it has three system components: a) NRG Transaction Manager (NTM): Each primary or neighbor replica has its own NRG Transaction Manager (NTM). Every transaction goes through the NTM before it will be processed. The NTM functions include: • Accepting a set of transactions from clients either • Propagates a lock synchronously to neighbor replicas.
• Checks the current write and unknown status counters to detect whether the transaction must perform or still require obtaining a quorum.
• Sends the updated counters to replicas.
• If the transaction gets a quorum, releases neighbor's locks for the neighbors that already in other quorum(s). • Replicate data to neighbors for particular data item x.
NRG daemon runs with the superuser privilege. This is because it must access to some sort of the privilege resources such as the configuration files. The daemon runs in the background and does not have a controlling terminal. In particular, it has been configured to be automatically functioning without human intervention.

RESULTS AND DISCUSSION
In this experiment, we will consider no failures during the transaction execution. In remainder of this section, the experiment involves phases in NRG transaction semantic. Without lost of generality, this experiment shows how to preserve the consistency of the same particular data file. As long as the same data is used, one-copy-serializability must be obeyed for all the transaction executions. In addition, it also shows that the data always available and reliable.  Table 2 shows the Primary-Neighbors Grid Coordination (PNGC) for replica A, B and D, which will be used by NRG daemon. The experiment of NRG daemon program was done in shell programming and Perl integrated with File Transfer Protocol (FTP) for the communications agent. Bourne Again Shell is selected since it riches with command-line editing facilities and jobs control capabilities. The job control provides greater flexibility in dealing with background processes. Meanwhile, an automated FTP is used in shell programming for sending agent. Red Hat Linux Kernel release 9 and Linux Slackware 2.4.2 are used as a platform to the replicated servers. All applications for users are available to these particular Linux platforms. As such, the applications for users include mcedit, vi and vim editor.
To simplify a clearer presentation of these experiments, assume that the transactions come to access particular data file a. Neighbor binary voting assignment [2] is initiated where S(B a ) = {i| B a (i) = 1, 1≤i ≤n} and B a (i) is the vote assign to site i, which has a particular data a.  S(B a ). The smallest total number to be replicated, d = 3 has been chosen because it easy to manage the transactions with a small pre-emptive lock, in order to get the write quorum. In particular, the write quorum must be more than a majority quorum. Since the transaction is proportional to the quorum size [8], less synchronization time is required for the transaction execution with a small pre-emptive lock. The transactions execution for any data on other servers is evaluated in the same manner. In particular with NRG Transaction Model, T q (a) User "azie" requests to update data file dds from replica A (b) User "noraziah" requests to update data file dds from replica B Fig. 2: Users concurrently request data file dds NRG daemon for primary replica A and B monitor all users current status that access particular data a. If any user accesses that data, then it redirects the user's information such as the pid, user name, tty, log time and access editor to the log information. NRG daemon manipulates its log information by using awk utility. The user's information that access the data file dds at the primary replica A and B are showed in Fig. 3a and 3b respectively. In particular, each primary replica has the user_act log file. with pid 24909 gets the lock (refer Fig. 3). The server status is initiated to 1 for its Target Set as shows in Fig.  4a. Fig. 4a and Fig. 4b show T q a α 1 , performs during an initialization and propagation lock phases.
(a) T q a α 1 , gets and propagates the lock to its neighbors.
, keeps propagating its lock to get a quorum Fig. 4: T q a α 1 , performs during an initialization and propagation lock phases.
Next, NRG daemon kills pid of T q a α 2 , . Kernel broadcasts message to acknowledge. Server status is initiated to 1 for its Target Set as depicts in Fig. 5a. Fig. 5 show T q a β 1 , performs from an initialization lock phase until wait user finishes updating data file dds. , which is the pid 24897. After that, kernel broadcasts the messages to user "suryani", as depicts in Fig. 6. Since T q a α 1 , at replica A and T q a β 1 , at replica B obtain the locks, NRG daemon controls the access permission mode of the data file dds. Hence, other transactions cannot read or update it at that time as shows in Fig. 7. The error message is generated automatically by the kernel.   Table  2). Primary replica processing for T q a α 1 , propagates the locks to its neighbor replicas B and D as depicts in Fig.  4a. It keeps propagates the lock as shows in Fig. 4b. This is because, T q a α 1 , still not get a quorum. Meanwhile, the primary replica processing for T q a β 1 , propagates the locks to its neighbor replicas D and A as depicts in Fig. 5a. Each NTM of neighbor replica calls neighbor_replica_processing function to check its feasibility lock and sends feedback to the primary. The first transaction that obtains a quorum denoted as ' 1 , is aborted as shows in Fig. 8.
Consequently, ' 1 , T q a γ gets all locks from S(B a ) at primary replica B, as depict in Fig. 5a and Fig. 5b. Next, NRG daemon changes an access permission mode of data file dds at primary replica B. Therefore, user "noraziah" can start modifying the contents of data file dds as depicts in Fig. 10. 1 , synchronously. Fig. 11a, Fig. 11b and Fig. 11c show ' 1 ,  Finally, NRG daemon changes an access permission mode to unlock data file dds. Hence, users can read or request to update it at any replica of S(B a ). Table 3

CONCLUSION
A fundamental challenge with replication is to maintain data consistency among replicas in distributed systems. The data organization through replication introduces low data consistency and coherency as more than one replicated copies need to be updated. Expensive synchronization mechanisms are needed to maintain the consistency and integrity of data among replicas when updates are made by the transactions. Furthermore, timeliness in synchronization has become show stopper to maximize the usage of system but at the same time contribute to the consistent and reliable computing. NRG Transaction Model resolves this challenge by alleviates lock with small quorum size before capturing update and commit transaction synchronously to the sites that require the same update data item. In particular, we have developed NRG daemon to manage replication and transactions in distributed system. We focus on NRG daemon that guarantees consistency and obey serializability through the synchronize replication. Based on experiment and result, it shows that NRG daemon solves the distributed concurrency transactions and guarantees the data consistency in distributed systems. This is due to the transaction execution is equivalent to one-copy-serializability.