An Efficient Middleware for Storing and Querying XML Data in Relational Database Management System

,


INTRODUCTION
Today's data exchange between organizations has become challenging because of the difference in data format and semantics of the meta-data which used to describe the data. Now day' XML emerged as a major standard for representing data on the World Wide Web while the dominant storage mechanism for structured data is the relational databases, which has been an efficient tool for storing, searching, retrieving data from different collection of data. The ability to map XML data in relational databases is difficult mission and challenging in the world of all IT organization so there is a need to develop an interfaces and tools for mapping and storing XML data in relational databases.
Taking up emerging requirements, database vendor such as IBM, Oracle and Microsoft are enabling their product for XML. There is a need arise to manage XML data and other data stored in relational data seamlessly at a time efficiently. The native-XML databases usually have limited support for relational data. XML -Enabled databases like IBM, Oracle and Microsoft have mature and proven techniques for relational data processing but XMLextensions have not been mature enough yet. In these vendor specific RDBMS; Database Administrators (DBAs) have to express how to map XML data into their systems and the XML storage are tailored to one particular system and are hard-coded to some default mapping on behalf of the users, so they cannot be used for any other relational backend. For solution to these problems a middleware is required for storing and querying xml data in any RDBMS.

XML:
The extensible Markup Language (XML) is quickly becoming the de facto standard for data exchange over the Internet and now it plays a central role in data management, transformation, and exchange. Since it s introduction to industry in the late 1990s, XML (Amirian and Alesheikh, 2008) has achieved widespread support and adoption among all the leading software tools, server, and database vendor s. As importantly, XML has become the lingua franca for data by lowering the cost of processing, searching, exchanging, and re-using information. XML provides a standardized, selfdescribing means for expressing information in a way that is readable by humans and easily verified, transformed, and published. This allows both information workers and automated applications to better find and uses the information they need. In addition, data can be transmitted to remote services anywhere on the Internet using XML-based Web services to take advantage of the new ubiquity of connected software applications. The openness of XML (Augeri et al., 2007). allows it to be exchanged between virtually any hardware, software, or operating system. Simply put, XML opens the door for information interchange without restriction. For its features in good description and transmission, the hot topic is to seek the best way for storing XML documents in order to get high query processing efficiency. At present, storing XML document in relational database is a promising way for that relational database is mature.
Relational Databases: Today, the dominant storage mechanism for structured enterprise data is the relational database, which has proven itself an efficient tool for storing, searching for, and retrieving information from massive collections of data. Relational databases specialize in relating individual data records grouped by type in tables. Developers can join records together as needed using SQL (Structured Query Language) and present one or more records to end-users as meaningful information. The relational database model revolutionized enterprise data storage with its simplicity, efficiency, and cost effectiveness. Relational databases have been prevalent in large corporations since the 1980s, and they will likely remain the dominant storage mechanism for enterprise data in the foreseeable future. Despite these strengths, relational databases lack the flexibility to seamlessly integrate with other systems, since this was not historically a requirement of the database model (Wilson, 2001). In addition, although relational databases share many similarities, there are enough differences between the major commercial implementations to make developing applications to integrate multiple products difficult. Among the challenges are differences in data types, varying levels of conformance to the SQL standard, proprietary extensions to SQL, and so on.

Structure Independent Mapping Approach:
The structure independent mapping approach is explained with a sample XML document shown in Fig. 1. In this study, we employ the data model of XPath (Haw and Lee, 2008) to represent XML documents. In the XPath data model, XML documents are modeled as an ordered and directed labeled tree. There are seven types of nodes. In this study, we consider only the following four types of nodes for the sake of simplicity: root, element, text and attribute. The root node is a virtual node pointing to the root element of an XML document. The database attributes DocID, PathID, Start, End, and value represent document identifier, start position of a region, end position of a region, and string-value, respectively. Each node is associated with start and end positions. A region (or the pair of start and end positions) implies a containment between elements with regards to the ancestor-descendant and parentchild relationships. For example, a node, ni, is reachable from another node nj, if the region of ni is included in the region of nj. In our study, we modified some attributes of tables in XRel approach: we added a ParentID column to Element, Attribute, and Text tables to find parent nodes easily. We put NodeID and last descendant node id (EndDescID) attributes instead of start and end columns in Element table. The XRel four tables (Path, Element, Attribute and Text) can be seen in Table1-4 for the XML document given in Fig. 1as following:     (Ali, 2006). The XParent schema stores the node information of the XML data graph of an XML document into four tables: LabelPath (ID, Len, Path) DataPath (Pid, Cid) Element (PathID,Did,Ordinal) Data (PathID, Did, Ordinal, Value) Table LabelPath stores the information of labelpaths of an XML data graph. The attributes ID and Len denote the unique ID and the length of each label-path, respectively. The attribute Path denotes the name of the corresponding label-path which is a sequence of node names in the label-path. Table  DataPath stores the information of parent-child relationships of an XML data graph. The attributes Pid and Cid denote the node number of the corresponding parent node and child node of an edge, respectively. Table Data stores the information of the nodes of the XML data graph if the corresponding elements or attributes in an XML document have a value. The attributes PathID and Did denote the ID of label-path (i.e., the foreign key of the ID in Table  LabelPath) and the node number of the node, respectively. The attribute Ordinal denotes the ordinal number of the node among its sibling-nodes with the same name. The attribute Value denotes the value of the node, where the value of anode is the value of the corresponding element or attribute in the XML document.  The middleware architecture and its implementation: The main idea for the implementation of the prototype is taken from (Şevkli et al., 2004) with modification of usage of available mapping strategy XParent instead of other method. We chose MYSQL as the DBMS for storing and retrieving XML documents using structure independent mapping approach because it is free of cost, open source and easily available. This implementation adds collection support to the MYSQL database. A collection is a set of XML document stored in fixed schema of tables. In this case it a set of fixed schema tables according to the proposed by XParent (Fakhraldien et al., 2010)In reality, MySQL database model does not change, but from the user point of view, inside a database there is not only tables but also collections. In addition, the users can not modify or access to the tables of collection directly. Users know the existing collection names and types only. They can create, drop or browse collections. The users can insert, browse or delete XML documents into collections. The independent classes in PHP can be created and embedded to PhpMyAdmin program which is a web based interface between MySQL and the users. The PhpMyAdmin program provides all database operations with user friendly web interface. In these Middleware we used two different independent mapping approach methods. There are variations of these methods as well. We chose the XParent and XRel method, because XParent is a four table database schema (LabelPath, DataPath, Element and Data).DataPath keeps parentchild relationalships while XRel does not explicitly stores edges, for data paths. Instead, XRel records containment relationships using the notion of region. Therefore, it needs joints in order to check edge connections .The design objective of this middleware is to provide efficient software that can use commercially available RDBMS to manage XML documents. After the implementation it will become repository for both XML documents and relational data. The Fig. 2 and 3 bellow outlines the architecture of the middleware which adds XML support to the MYSQL database system. The three main classes (Collection, Document, and XPath) can be used to adapt the same interface to other database systems.

RESULTS
The proposed middleware can be use as an efficient solution with respect to query processing specially recursive XML quires and updating. In comparison to other middleware's which used in storing and querying XML documents in relational database this middleware can act as efficient mediator between XML and r relational database.
This experimental discusses the result of storage and retrieval time of XML documents and a set of XPath queries using XParent and XRel methods. Comprehensive experiments were conducted to study the performance of XParent, in comparison with other approaches. While among those RDBMS-based approaches, XParent perform significantly better than other model-mapping-based approaches such as Edge and XRel. One observation is that RDBMS-based approaches can outperform special-purpose XML repositories such as Lore and Tamino. All experiments were conducted on p4 350 MHz PC with 1 GB RAM, 40GB hard disk, windows XP using the middleware.

DISCUSSION
From the presented results, the proposed middleware can be use as an efficient way for storing and queering XML data in relational database. This middleware has the following unique features.
XML data is stored in relational tables according to the XParent mapping-schema, an efficient, modelmapping approach without assistance of DTD.

•
The visual query interface of XParent provides both expressive powers for professionals and user friendliness for native users.
Overall the performance of XRel and XParent methods is comparable in most cases with exception of long XPath queries where XRel is definitely faster.
We can say that the XParent method can certainly be considered for query processing in most cases. We think that the execution times for queries processing are adequate and comparable to commercial applications and databases. The middleware has flexibility to add any future proposed more efficient schema-oblivious mapping straggly as new collection. The mediator can also be used as benchmark tool for the researchers to compare various model mapping XML schemas by adding them as collection. Our implementation adds collection support to XML into the MYSQL database. A collection is a set of similar XML documents stored in fixed schema of tables, sometimes referred to as XML repository. In realty, MYSQL database model does not change, but from the user point of view, a database can contain tables and collections.

CONCLUSION
Based on experiment and result, it shows that the proposed middleware can be used as an efficient, affordable and quick solution until XML data processing matures. The key to Middleware approach is storing XMIL documents in the relational databases, providing a user interface for XML manipulation and adding an XPath query processor for XNML querying. The Middleware impleinented in this study can be used with any other database management system as it doesn't require any modification to DBMS itself. It provides collections or XML repositories to store XML documents in a database. Organizer should look for developing middleware's to store and quire XML documents into relational databases. That middleware should store XML files directly into a relational database by integrity and efficiently way.