Robust and Blind Watermarking of Relational Database Systems

: Problem statement: Digital multimedia watermarking technology was suggested in the last decade to embed copyright information in digital objects such images, audio and video. However, the increasing use of relational database systems in many real-life applications created an ever increasing need for watermarking database systems. As a result, watermarking relational database systems is now merging as a research area that deals with the legal issue of copyright protection of database systems. Approach: In this study, we proposed an efficient database watermarking algorithm based on inserting binary image watermarks in non-numeric mutli-word attributes of selected database tuples. Results: The algorithm is robust as it resists attempts to remove or degrade the embedded watermark and it is blind as it does not require the original database in order to extract the embedded watermark. Conclusion: Experimental results demonstrated blindness and the robustness of the algorithm against common database attacks.


INTRODUCTION
Digital images, video and audio are examples of digital assets which have become easily accessible by ordinary people around the world. However, the owners of such digital assets have long been concerned with the copyright of their digital products, since copying and distributing digital assets across the Internet was never easier and possible as its nowadays. Therefore, researchers have been looking for ways to protect the ownership of digital assets for a long time. Digital watermarking technology was suggested lately as an effective solution for protecting the copyright of digital assets [3,9,12] . This technology provides ownership verification of a digital product by inserting imperceptive information into the digital product. Such 'right witness' information is called the watermark and it is inserted in such a way that the usefulness of the product remains, in addition to providing it with robustness against attempts to remove the watermark.
Most watermarking research concentrated on watermarking multimedia data objects such as still images and video [8,13,18] and audio [3,5,15] . However, watermarking of database systems started to receive attention because of the increasing use of database systems in many real-life applications. Examples where database watermarking might be of a crucial importance include protecting rights of outsourced relational databases and allowing the creation of secured and copyright-protected web-based services that enable users to search and access databases remotely [6,10,19] .
Due to the different characteristics between images or audio and relational data, there exists no image or audio watermarking method suitable for watermarking relational databases. Therefore, relational database watermarking is, in fact, a process challenged by many factors such as data redundancy fewness, relational data out-of-order and frequent updating. Moreover, database systems watermarking has unique and sometimes complex, requirements that differ from those required for watermarking digital audio-visual products. Due to such unique requirements and challenges, literature on watermarking relational databases is very limited and has focused mainly on embedding short strings of binary bits in randomly selected locations in numerical databases. Most proposed algorithms lack robustness against bit-level attacks such as bit-setting, bit-resetting and bit-flipping. Other database watermarking algorithms embed watermark information in the statistical properties of tuples rather than in the data itself. These algorithms are computation-intensive and still lack solid mathematical formulations.
In this study, a binary image is used to watermark a given relational database system. The watermark image is embedded in non-numeric, multi-word, attributes of a selected number of tuples of the database. The algorithm is robust as it resists attempts to remove or degrade the embedded watermark and it is blind as it does not require the original database in order to extract the embedded watermark.

MATERIALS AND METHODS
Database watermarking research: Watermarking of relational database systems is a relatively new field and thus research literature has been very limited and reported results are insufficient [14,17,27] . Accordingly, we anticipate that advancements in this area will continue, but at a slow pace due to the challenges and unique requirements imposed by the nature of relational databases. In what follows we will describe briefly such unique requirements and challenges. We will also outline classes of database watermarking algorithms that have been proposed in literature.

Unique Requirements of Database Watermarking
Watermarking database systems has unique requirements that differ from those required for watermarking digital image and audio systems [2] . The watermarked database must maintain the following properties: Usability: That mount of change in the database caused by the watermarking process should not result in degrading the database and making it useless. The amount of allowable change differs from one database to another, depending on the nature of stored records.

Robustness:
Watermarks embedded in the database should be robust against attacks to erase them. That is, the database watermarking algorithm must be developed in such a way to make it difficult for an adversary to remove or alter the watermark beyond detection without destroying usability of the database.
Blindness: Watermark extraction should neither require the knowledge of the original un-watermarked database nor the watermark itself. This property is critical as it allows the watermark to be detected in a copy of the database relation, irrespective of later updates to the original relation. Structure: A database is made of inter-related tuples. The tuples that are joined before the watermarking process should not be altered during watermarking.
Moreover, scale and classification must be considered during the watermarking process since they have impact on the semantics of the database.
Security: Choice of he watermarked tuples, attributes, bit positions should be secret and be only known through the knowledge of a secret-key. Owner of the database should be the only one who has knowledge of a secret-key.
Incremental watermarking: After a database has been watermarked, the watermarking algorithm should compute the watermark values for added or modified tuples only. The already unaltered watermarked tuples should not be watermarked again.

Challenges of database watermarking:
Watermarking relational database is challenged by the following factors [28] : Few redundant data: A relational database is made up of tuples, each indicating an independent object. Therefore, watermarks basically have no places to hide.
Out-of-order relational data: Tuples of a relational database have no fixed location. This makes building a corresponding relative is very difficult in relational databases.
Frequent updating: Insertion, dropping, updating of operation of relational database is very frequent. Without malicious intention, users often casually drop some tuples or attributes. On the other hand, the pirate can add or substitute the tuples and attributes.
Existing database watermarking methods: There has been a few proposed relational database watermarking algorithms. Published algorithms can be classified as bit-level watermarking algorithms, statistical-property watermarking algorithms and image-based watermarking algorithms. The three classes of algorithms operate on numeric attributes of relational databases. A brief description of each class is given: Bit-level watermarking algorithms: In these algorithms, certain attributes of a selected subset of tuples are chosen to hide watermark bit information. Attribute selection is based on the value of a hash function. For each selected attribute, some bit positions will be marked amongst a predetermined number of least significant bits of the attribute [1,2,4,7,11,16,28] .

Statistical-property watermarking algorithms:
In theses algorithms, watermark bits are not encoded in the data itself, but rather in actual data distribution properties of s subset of tuples. The complete set of tuples making up the database is partitioned into a maximal number of unique, nonintersecting subsets of tuples. For each selected subset of tuples, a watermark bit is embedded by making minor changes to some of the data values in the tuples, in such a way to make the subset's average and variance values reach two possible values depending whether the watermark bitis 0 or 1 [20][21][22][23][24][25] .
Image-based watermarking algorithms: In these algorithms, an image is used to watermark the database. The image is transformed into bits which represent the watermark bits. The bits are embedded in carefully chosen locations in database and if recovered correctly can be used to reconstruct the embedded image [26,28] . This class of watermarking methods can be considered as a sub-class of the bit-level watermarking class.
Proposed watermarking algorithm: In our proposed algorithm, a binary image is used to watermark relational databases. The bits of the image are segmented into short binary strings that are encoded in non-numeric, multi-word attributes of selected tuples of the database. The embedding process of each short string is based on creating a double-space at a location determined by the decimal equivalent of the short string. Extraction of a short string is done by counting number of single-spaces between two separated doublespace locations. The image watermark is then constructed by converting the decimals into binary strings. A major advantage of using the space-based watermarking is the large bit-capacity available for hiding the watermark. This facilitates embedding large watermarks or multiple small watermarks. This is in contrast to bit-based algorithms where watermark bits have limited potential locations that can be used to hide bits without being subjected to removal or destruction. Our proposed algorithm has two procedures: watermark embedding procedure and watermark extraction procedure. The two procedures are described below.
Watermark embedding procedure: The watermark embedding procedure consists of the following operational steps: Step 1: Arrange the watermark image into m strings each of n bits length where subscripts represent space number and DS corresponds to double space Step 3: Embed the m short stings of the watermark image into each m-tuple sub-set Step 4: Embed the n-bit binary string in the corresponding tuple of a sub-set as follows: • Find the decimal equivalent of the string. Let the decimal equivalent be d • Embed the decimal number d in a pre-selected nonnumeric, multi-word attribute by creating a doublespace after d words of the attribute Step 5: Repeat step 4 for each tuple in the subset Step 6: Repeat steps 4 and 5 for each subset of the database under watermarking An illustration of embedding the binary watermark into a sub-set of tuples is shown in Fig. 1. The watermark is a of 4×3 binary image. Each of the four 3bit binary strings is transformed into its decimal equivalent as shown in Fig. 1a and embedded in the 4tuple sub-set, as shown in Fig. 1b. The count of numbered single spaces appearing before the Double-Space (DS) indicates the decimal equivalent of the embedded short binary string.
A snapshot of the relational database after embedding the watermark throughout the database is shown in Fig. 2. The tuples in Fig. 2 constitute the database and the A's are the watermarked non-numeric, multi-word attributes for each tuple.
Watermark extraction procedure: The watermark embedding procedure consists of the following operational steps: Step 1: Arrange the watermark image into m strings each of n bits length Step 2: Divide the database logically into sub-sets of tuples. A sub-set has m tuples Step 3: Embed the m short stings of the watermark image into each m-tuple sub-set Step 4: Embed the n-bit binary string in the corresponding tuple of a sub-set as follows: • Find the decimal equivalent of the string and give it the symbol d • Embed the decimal number d in a pre-selected nonnumeric, multi-word attribute by creating a doublespace after d words of the attribute Step 5: Repeat step 4 for each tuple in the subset Step 6: Repeat steps 4 and 5 for each subset of the database under watermarking

RESULTS
The proposed algorithm has been evaluated and tested on an experimental database that we have constructed. The database consists of 1000 tuples and runs under the Oracle platform. We concentrated our performance evaluation on the robustness of the proposed algorithm by virtue of the fact that, database watermarking algorithms must be developed in such a way to make it difficult for an adversary to remove or alter the watermark beyond detection without destroying the value of the object. In particular, the database watermarking algorithm should make the watermarked database robust against the following types of attacks: subset deletion attack, subset addition attack, subset alteration attack and finally subset selection attack. The results are shown in Fig. 3-6.

Subset deletion attack:
In this type of attack, the attacker may take a subset of the tuples of the watermarked database and hope that the watermark will be removed. The graph shown in Fig. 3 indicates that The watermark will be removed only and only if, all the database was deleted!. That is, removing more than 95% of the database will not result in removing the watermark. This is due to the fact that the proposed algorithm embeds the same watermark everywhere in the database, making this type of attack ineffective.

Subset addition attack:
In this type of attack, the attacker adds a set of tuples to the original databse. This is one of the most difficult attacks to defeat. The attacker may add some tuples to the watermarked table. But this form of attack has little impact on the watermark embedded through our algorithm. The graph shown in Fig. 4 indicates that the watermark will never be removed even if the added tuples are as many as the original tuples. That's, only the added tuples will not carry the watermark information.

Subset alteration attack:
In this type of attack, the attacker alters the tuples of the database through operations such as linear transformation. The attacker hopes by doing so to erase the watermark from the database. The graph shown in Fig. 5 indicates that the watermark will remain even if 90 % of the tuples of the database were altered. This is due to the fact that the proposed algorithm embeds the same watermark everywhere in the database, making this type of attack ineffective. Subset selection attack: In this type of attack, the attacker randomly selects and uses a subset of the original database that might still provide value for its intended purpose. The attacker hopes by doing so that the selected subset will not contain the watermark. However, since the proposed algorithm embeds the watermark in the whole database, this attack is of little or no threat. The graph shown in Fig. 6 indicates that the watermark will remain even if the attacker selects a subset as small as 10% of the original database. That's no matter how the small subset he selects, the watermark will remain in the selected subset and thus providing the required copyright protection.

CONCLUSION
In this study, we proposed a watermarking algorithm based on hiding watermark bits in spaces of non-numeric, multi-word, attributes of subsets of tuples. A major advantage of using the this approach is the large bit-capacity available to hide large watermarks. This is opposite to the other proposed algorithms where watermark bits have limited potential bit-locations that can be used to hide them effectively without being subjected to removal or destruction. The robustness of the proposed algorithm was verified against a number of database attacks such subset deletion, subset addition, subset alteration and subset selection attacks. Ongoing and future research includes the development of other effective database watermarking algorithms.