A Metadata Based Storage Model for Securing Data in Cloud Environment

IT Enterprises are migrating to the Cloud environment at a faster pace. Though Cloud Computing is quickly evolving as the next generation architecture for enterprises, there are astounding issues with this environment that will pop up as more and more applications and data move into the cloud. Security of information that is being processed by the applications and ultimately getting stored in the data centers are of big concerns of this newly evolving environment. The security of the data is a concern not only during transferring of data through the wires but also during its storage. And the architecture that is needed to secure the stored data is of much importance than while the data is getting transferred because of the fact that the data resides relatively for a long time in the storage area than in the wires. To ensure the security of the data stored in the data centers, we propose a new methodology which might not completely help in restricting a hacker to access the data but will make the data invaluable if it is extracted by a hacker but at the same time ensures the quality of the data that is being provided to its respective owner or authorized user. We propose a metadata based data segregation and storage methodology and also solutions to access this segregated data. This methodology ensures that data is invaluable during static residence and gains value only during acquisition or updation.


INTRODUCTION
Even as an increasing number of firms look at embracing cloud computing, the security of data predominantly remains as a primary concern. Cloud requires security which depends and varies with respect to the deployment model that is used, the way by which it is delivered and the character it exhibits. Some of the fundamental security challenges are data storage security, data transmission security, application security and security related to third party resources [1]. As this new generation infrastructure gains momentum, more and more applications and data are moved to this untested environment. Though the underlying infrastructure of the system paves way for elasticity and easy deployment of the services by vendors, this mounting opportunity has a trailing risk which poses a major risk and concern over the system's security. Cloud computing moves the application software and databases to the large data centers, where the management of the data and services are not trustworthy. This unique attribute, however, poses many new security challenges [2]. These security concerns should be curtailed at its root instead of deploying much effort at the later stages when the system is scaled beyond imagination and solutions are outside implementable limits. To realize this tremendous potential, business must address the privacy questions raised by this new computing model [3] This paper proposes a methodology for securing data that is being stored at data centers and other locations of the cloud. The data under consideration is inclusive of data that is residing in a database and as well as in the file system. The life time of the data at the storage location is obviously more than the time it is over transmission. Though data transmission security is of importance, the security of the data at the stored location is of utmost importance. Hence we propose a methodology to secure data during its time it is being residing in the storage location. This inherently triggers the need for designing ways to store and retrieve data. The rest of the paper unfolds this methodology and is organized as follows: Section II discusses about related works in this area, Section III describes the overall functionality of this methodology. Section IV explains ways to design data storage and access methods to maintain integrity of data. Section V provides a sample implementation algorithm; Section VI provides different elements that should be considered for storing data in database. It also lists out generic concerns that have to be taken into account to adhere to the proposed model.

II. RELATED WORKS
Ignoring fragmentation with respect to providing security, data fragmentation is not a new concept. Concepts like these are already in use for providing optimization of data access in distributed systems. But most of them do not take security as the concern for fragmentation. One such work is regarding fragmentation and allocation of data in distributed database systems [4]. Here they propose a model to fragment data horizontally or vertically with relation to the tuples so that data can be accessed or updated in an optimized manner. Another work is related to enhancement of ADRW algorithm to achieve dynamic fragmentation and object allocation in distributed databases [5]. Here they deal more about the cost involved in accessing data fragments from remote sites. These algorithms provide optimal ways to re-arrange and access data that are fragmented and stored in different locations. The main concerns in these works are to fragment data on the basis of easy retrieval but not relating to providing security to the data under consideration. Fragmentation of data based on relevance to data value is not targeted in any of the works. Fragmentation based on meta data is used in some works but those considerations are truly based on relevance to optimize data access rather than to the security of the data itself.

III. METADATA BASED DATA STORAGE MODEL
This model is based on the fact that any information is valuable only as long as the fragments of the information are related to each other. When related information are not available in a mapped manner, it is of no use. For example, information about a credit card number without its corresponding information like card holder name, validity date information and Card Verification Value (CVV) is invaluable and so is it's vice versa. And a similar example is the mapping of username and password. A username alone is not valuable and so is the information about the password alone. The information becomes valuable only when these fragments of information are mapped. The mapped information about elements is required only for authenticated users and owners of the respective information. A well known instance of intrusion of user information is the one recorded by Sony PS Network in recent times [6].
In such a scenario, there is no necessity that data should be stored in a mapped manner. But mapping is needed at the point of usage. Juels et al. [7] described a formal "proof of retrievability"(POR) model for ensuring the remote data integrity. Their scheme combines spot-checking and errorcorrecting code to ensure both possession and retrievability of files on archive service systems. The time of usage of the information is apparently very less in comparison to the time that data is present at the storage location. Thus two types of security concerns arise. One concern is during data usage, i.e. during transmission and secondly, static phase of the data, i.e. during residing at storage centers. With respect to the data security during transmission in the cloud we have proposed a layered framework to deliver security as a service in cloud environment [8]. This framework consists of a security service which provides a multi-tier security based on the need of the transaction. The framework provides dynamic security to users based on their security requirements, thus enabling localized level of security and thereby reducing the cost of security for applications requiring less security and providing robust security to applications really in need of them.
The model described in this paper only deals with the data security at the storage centers. This in turn has two concerns: One issue is about the actual physical unit where the data is stored and the other one is the intrusion into the information. Our model is mainly focused in providing security in avoiding intrusion. This model does not prevent hackers from getting hold of the data. Rather it makes the data invaluable even if it is accessed by an intruder.
To adhere to this model, care has to be taken right from the design phase of the information storage. Data has to be segregated into Public Data Segment (PDS) and Sensitive Data Segment (SDS). The SDS has to be further fragmented into smaller units until each fragment does not have any value individually. The fragmentation need not be of multiple levels. Instead, effort has to be put in to identify the key element that makes the data sensitive and should be fragmented separately. Figure 1, explains this fragmentation. The value of the information is actually destroyed in this process. But as and when fragmentation is done, the mapping data required to re-assemble the information should also be generated parallely. This can be done for database that is being designed from scratch. But, this is not effective for enterprises who want to move their existing data to the cloud. As a measure of migration of data from existing environment to cloud, the migration should be done appropriately. This can be made feasible by this model. For achieving this, we need a Data Migration Environment (DME) which does this job. The input to DME should be the existing schema of the database and additionally information about the sensitive part of the schema should be given as Metadata to the DME. The DME can fragment the data into pieces based on the level of security needed. Along side it will prepare a mapping table to re-assemble the data. The functionality of this environment and considerations for data integrity are discussed in the next section.

IV. THE METHODOLOGY
Let us consider our previous example of credit card information and roll out our methodology using this example.
Consider a database in a bank consisting of user information along side with the credit card information. The schema for storing such information will be in the form of tables with some tables containing personal information of the user and some tables containing information regarding to credit cards and will be mapped using their ids.
This particular information can be stored in a database (say bankDb) this way. An intruder who gets access to this particular database can exploit this information because all related information are stored at the same location.
In this example, the Customer Our model enforces that the related data should be stored at different locations and should be mapped runtime either during update or query. Consider that this entire model is migrated to our proposed model through the DME. The user has to supply the schema information of these tables to the DME and along side its metadata. Let us consider only three categories of metadata for this example. The data which is having low value is considered as 'Normal'. The data which is having high value is considered as 'Critical' and the data which has value when mapped with other data is considered as sensitive. And the data which maps 'Sensitive' or 'Critical' data to 'Normal' data is also considered 'Sensitive'. The metadata for our example are shown in Table I: The DME now has to fragment this data. The DME should be able to be configured or customized with respect to the level of security required. Considering our example, if we want the DME to provide medium level security, it should fragment only data which are of 'Critical' criteria. And if high level security is required, it should fragment data present in both 'Critical' and 'Sensitive' criteria. The DME is not aware of the actual data residing within these tables. Hence along with the metadata of the tables, the primary key column name should be provided in addition to it. This is easily available with the schema information of the database tables. The different levels of security needed and their corresponding metadata should be configured with the DME.
Let us consider that we need medium security for our database. Then the DME can fragment only the data that is 'Critical'. In our example, we have one 'Critical' data set. Now when we look into the data of the above three tables all of them will fall under the 'Sensitive' category of metadata. Table II lists the metadata of the database at this current situation.  After fragmentation is completed, the DME segregates the schema, separating out the data modified by DME, 'Originally Sensitive' data and 'Normal' data as shown in Table III. The DME then moves the 'Normal' data to one database and 'Originally Sensitive' data to another database and AD of 'Sensitive DME' data to another database at different location and MD of 'Sensitive DME' to the database with 'Normal' data. With respect to the AD, if DME creates its own table, then this table will be the most sensitive data and will be stored in a different location. Different location here means either different server at the same geographical location or at different geographical location. Additionally one more mapping is required for mapping the original  Now each database contains data which does not have value in itself. The entire mapping is done only during runtime and the value is built up temporarily during access and update and later its value is destroyed. An intruder who gets access to the data during the static phase of the life cycle of the data can not use the data to exploit the information by any way. The integrity between the original schema and the new schema can be taken care by deploying a database runtime migration environment which will deploy all the logics required for the runtime generation of schema and its corresponding drop after its lifecycle.

V. IMPLEMENTATION AND COST
A typical algorithm that will be used for fragmentation is as follows. During querying of data, the runtime environment uses the hash table containing information of the fragmented tables to restructure the input query by replacing and inserting a join query with the input query and then executing it to form tables with original relationships of the data and once the dynamically created tables are destroyed after the access is over. The fragmentation of data incurs a cost overhead which can be calculated as follows C1 = Cost of Fragmentation of one Critical Total cost of Security with fragmentation and encryption when compared to security using only encryption and fragmentation for data dispersal purpose T = C1+C2+C3+C4+C7-T1 where T>T1.
The cost of this method is more than traditional methods but it provides a better security. Since this model will be deployed in a cloud which is conceptually an environment with high pocessing power, the cost incured will provide proper justification when compared to the security it provides. Data dispersal and Data fragmentation are some of the techniques that can be attempted with ease with the cloud environment [9] VI. LIMITATIONS AND CONSIDERATIONS The limitations in this model are the initial effort taken to configure the DME and then migration of the existing data to the new model. Changes to the existing conventional database engines are unavoidable, because there will be an inherent need for plugging in the DME and the database runtime migration environment to these engines. There is a cost which is incured due to fragmentation of data. This cost includes cost of fragmentation of data while storage and also cost of forming the data at runtime from the fragmented data. But this cost is not newly introduced to the system because data fragmentation is already a practical methodology that is followed for distributed systems. Here the fragmentation is provided to make the data secure.In addition to fragmentation, a proper encryption technique can be used to provide additional security. This encryption can be done only to data that is fragmented as 'sensitive' by the DME. This reduces the cost of encryption of the entire database.

VII. CONCLUSION
In this paper we investigated the issues in security in data storage in cloud environment. To ensure that the data is secure during the stored phase of the life cycle of the data, we proposed a metadata based model using which the data residing at data center are robbed of their values and the values are temporarily built up during runtime and then destroyed once its usage scope is completed. This makes the data invaluable even if an intruder gets access to this data. Though this model will take some quantifiable effort to be implemented in real time, it provides necessary solution for an environment like the Cloud which is showing an adverse potential to become the next generation enterprise environment. Implementing such a model during the earlier phases of the evolution of the system will be relatively easier with respect to implementing it after lot of data take refugee in the cloud. This model in combination with our multi-tier security model for securing data over transmission will provide proper cross bars in the wires of malicious users.