DATABASE REPLICATION IN HETEROGENOUS PLATFORM

The application of diverse database technologies in enterprises today is increasingly a common practice. To provide high availability and survavibality of real-time information, a database replication technology that has capability to replicate databases under heterogenous platforms is required. The purpose of this research is to find the technology with such capability. In this research, the data source is stored in MSSQL database server running on Windows. The data will be replicated to MySQL running on Linux as the destination. The method applied in this research is prototyping in which the processes of development and testing can be done interactively and repeatedly. The key result of this research is that the replication technology applied, which is called Oracle GoldenGate, can successfully manage to do its task in replicating data in real-time and heterogeneous platforms.


INTRODUCTION
Organizations need to provide updated data to users that might be geographically remote while handling vast amount of requested data distributed in multiple sites. The availability of a certain data at a huge network is one of the issues that still unsolved (Ahmad et al., 2010).
People often use the terms data and information interchangeably. However, it is better to view data as raw material resources that will be processed into finished information products. Then we can define information (O'Brien and Marakas, 2011) as data that have been converted into a meaningful and useful context for specific end users.
Today in most industries, a majority of overwhelming business transactions is executed digitally as businesses expand the use of their online applications. To succeed in this real time (RTI, 2012) environment, companies must eliminate interruptions in data availability, sluggish application performance and stale data. There are four aspects to this challenge: • Availability. Business-critical applications and underlying data must be accessible at or near 24/7/365 without service interruption or performance degradation • Reduced latency. Data must remain fresh. As it ages, data becomes less relevant and less valuable • Heterogeneity and IT flexibility. Integration and replication solutions must have the flexibility to be easily modified and distributed across diverse IT systems • Transaction integrity. Data completeness and accuracy must be ensured as it moves through systems There are essentially four kinds of replication styles -active replication, semi-active (leader-follower) replication, passive replication and coordinator-cohort replication. In active replication (state-machine approach), each server replica processes every client invocation and returns the response to the client. With active replication the availability of system is more when comparing to any other replication technique (Saravanan and Ramaraj, 2009).
This study gives an overview of an active replication technology in which users can replicate from any RDBMS to another RDBMS in heterogeneous environments using third party software, that is, Oracle Golden Gate.

Previous Researcher on Heterogeneous Database Replication
Large amounts of work have already been done on distributed database systems (Silberschatz et al., 2011;Murthy et al., 2012). Research on cross-platform environment is similar to that of heterogeneous database systems, in which different sites (nodes that are equipped with a computational and storage resource) may use different ways of storing and representing data, file formats, access protocols, query languages. The sites in this cross-platform environment communicate through an insecure channel. This study provides a solution in which relational database can be replicated from server to client based on replication or fragmentation (vertical) without any loss of information. This replication is independent of any platform and secured by strong cryptographic algorithm. Here, each site may have different Relational Database Management Systems (RDBMSs), such as Oracle, MySql, DB2, MS Access.

Distributed Database
Data distribution is a very important issue in distributed database, the database fragments need to be assigned to nodes in the computer network (Khder et al., 2013).
Distributed database can be clasified on homogenous and heteregonous database. In literature (Silberschatz et al., 2011) homogeneous distributed database system, all sites have identical database management system software, are aware of one another and agree to cooperate inprocessing users' requests. In contrast, in a heterogeneous distributed database,different sitesmay use different schemas and differentdatabase-management system software.
Consider a relation r that is to be stored in the database. There are two approaches to storing this relation in the distributed database: • Replication. The system maintains several identical replicas (copies) of the relation and stores each replica at a different site

Benefits of Database Replication
Some of the main benefits associated with database replication (Connolly and Begg, 2010) are: • Availability, refers to how replication increases the availability of data for users and applications through the provision of alternative data access options. If one site becomes unavailable, then users can continue to query or even update the remaining locations • Reliability, refers to the fact that with multiple copies of the data available over the system, this provides excellent warm standby recovery facilities in the event of failure at one or possibly more sites • Performance, refers to the improvement for query transactions when a system suffers from a significant overloading of centralized resources. Replication provides fast, local access to shared data because it balances activity over multiple sites • Load reduction, refers to how replication can be used to distribute data over multiple remote locations. This configuration can significantly reduce network traffic. Also, users can access data from the replicated site that has the lowest access cost, which is typically the site that is geographically closest to them • Disconnected computing, refers to how replication can be supported by snapshots • Supports many users, refers to how organizations increasingly need to deploy many applications that require the ability to use and manipulate data • Supports advanced applications, refers to how organizations increasingly need to make the corporate data available not only for traditional Online Transaction Processing (OLTP) systems but also for advanced data analysis applications such as data warehousing, Online Analytical Processing (OLAP) and data mining

System Architecture
In this study, we apply the heterogenous database replication using Oracle Golden Gate solution. Golden Gate (Jeffries, 2012) can achieve heterogeneous and homogeneous-real-time transactional Change Data Capture and integration by decoupling itself from the database architecture.
Therefore it provides a performance boost as well as flexibility through its modular components. A number of system architecture solutions are offered for data replication and synchronization: • One-to-one (source to target) It's the simplest and the most common configurations, here we are performing real-time or batch change data replication between two sites. This could be one example between a promary and standby site for disster recovery or an OLTP to datawarehouse for business intelligence • One-to-many (one source to many targets) This architectures provide two solutions, one data replication feed for reporting and one for backup and disasster recovery • Many to one This configurations is common in all industries, for example from retailing outlet taking customer orders to centralized database in head office • Cascading this offer s data replication at n sites, originating from a single source. As the data flows from the originating source database, parts or all of it are replicate at each site in cascading fashions untik the final target populated • Bi-directional (active) This is an active-active configurations. In this model sites A sends changed data to site B vice versa where conflict handling is an important considerations • Bi-directional (active passive) This is also known as live standby where site A sends changed data to site B only

THE PROPOSED MODEL
There are two primary objectives and needs addressed by PT.XYZ: • Continuous availability and disaster tolerance. This is to meet end-users' expectations for 24/7 system availability and to surpass and enhance the service level • Real-time data integration for live reporting. This is to serve the purposes of real-time, operational reporting, freeing up the resources of the critical source systems The source data is on MS SQL Server database running on MS Windows platform. The target is an MySQL database running on Linux platform.
In order to meet the continuous availability and disaster tolerance requirements, we propose Oracle GoldenGate solution as depicted in the Fig. 2 below. Oracle GoldenGate consists of decoupled modules that are combined to create the best possible solution for business requirements as follows (Fig. 2):

On the Source System (s)
GoldenGate's Capture (Extract) process reads data transactions as they occur, by reading the native transaction log, typically the redo log. Oracle GoldenGate only moves changed and committed transactional data, which is only around 30% of all transactions-therefore it operates with extremely high performance and very low impact on the data infrastructure.
Filtering can be performed at the source or targetat tables, columns and/or rows.
Transformations can be applied at the capture or delivery stages.

Trail Files
To move transactional data efficiently and accurately across systems, Oracle GoldenGate converts the captured data into an Oracle GoldenGate data format in "trail" files. With both source and target trail files, Oracle GoldenGate's unique architecture eliminates any single point of failure and ensures data integrity is maintained-even in the event of a system error or outage.

Routing
Data is sent via TCP/IP to the target systems. Data compression and encryption are supported. Thousands of transactions can be moved per second and without distance limitations.

On the Target System(s)
A Server Collector process (not shown) reassembles the transactional data into a target trail.

The
Delivery (Replicat) process applies transactional data to the designated target systems using native SQL calls.

Bi-Directional (if Needed)
In bi-directional configurations/solutions, this process runs reciprocally, to concurrently synchronize data between the source and target systems.
Manager processes perform administrative functions at each node.
To meet real-time data integration for live reporting, a secondary site can be deployed to serve the purposes of real-time, operational reporting, instead of runing on primary site. In this case, the secondary site would be MySQL running on Linux platform.

RESULTS AND DISCUSSION
To perform this experiment we built two Virtual Machines on X series server, where one VM server represents the source server and the other one represents the target server as shown in the Table 1 below.
After the environment ready, we do some test scenario in several conditions to check wheter the data could replicate without any lost. The test scenarios is describe in Table 2 below. As result of the scenario we can see at Table 3 below. As we see the test scenari and result from Table 2 and Table 3, the GoldenGate can replicate data event in condition when the network is off. This can be seen as we turn off one of the replicate component. The data will be pool at trial file at server 1 and continue deliver to server 2 as the network is on without data lost. Stop the replication components on server in a state STOP server 2 in a time-span of 10 min Data transaction remains running when one server Update all value on fields in Server 1 replication components in a state STOP Run the replication components on server 2 Activate replication components on server 2 and query on the modified tables Server maintenance

AJAS
Stop the replication components on server 1 and then turn off server Data transaction remains running during one replication Update, add, delete data on server 1 components server is in a maintenance state One server is active Activate server 1, but replication components are not running Run the replication components Run on replication components in server1, perform a query on the server 1 Table 3. Test results Expected results Test results Replication components on server 1 must be in a running state Succeed There were changes in the data on the server 1 and no change on the server 2 Succeed After the components on server 2 are active, the data transactions that occur on the server 1 are replicated on the server 2 Succeed On the server 2 all components must be in a running state Extract data pump abended There were changes in the data on the server 2 and no change in server 1 Succeed Replication components that are in an Succeed with 1minute delay ABENDED state can be up automatically while server 1 in an active state After the components on server 1 are active, the data transactions that occur on the server 2 are replicated on server 1 Succeed

CONCLUSION
Replication of tables across different database management systems running on diffferent platforms can be efficienntly done. Data is effectively transferred and stored without any loss of information. With the third party tool, such as Oracle GoldenGate, the complexity of the processes is hidden from user.
The Oracle GoldenGate technology can be applied to heterogeneous operating system environments and DBMS platforms. It has been successfully proven by performing a case study on uni-directional replication topology applying MSSQL Server DBMS running on Windows platform and MySQL DBMS running on Linux platform.