Journal of Computer Science

Utility Independent Privacy Preserving Data Mining on Vertically Partitioned Data

E. Poovammal and M. Ponnavaikko

DOI : 10.3844/jcssp.2009.666.673

Journal of Computer Science

Volume 5, Issue 9

Pages 666-673

Abstract

Problem statement: Driven by mutual benefits, or by regulations that require certain data to be published, there has been a demand for the exchange and publication of data among various parties. Data publishing has been ubiquitous in many domains such as medical, business and education. Detailed person-specific data, present in the centralized server or in the distributed environment, in its original form often contains sensitive information about individuals, and publishing such data immediately violates individual privacy. The main problem in this regard is to develop method for publishing data in a more hostile environment so that the published data remains practically useful while individual privacy is preserved. There are n parties, each having a private database, want to jointly conduct a data mining operation on the union of their databases. How could these parties accomplish this without disclosing their database to the other parties or any third party? Approach: To address this issue, we developed a simple technique of transforming the categorical and numeric sensitive data using a mapping table and graded grouping technique, respectively. The typical data mining tasks such as classification, clustering and association rule mining were performed on both the original and transformed tables. The rules/results/patterns of both the tables were compared and the utility of the transformed data was evaluated. Results: The evaluation results demonstrated that the proposed approach was able to achieve cent percent utility for any type of mining task as compared to the original table. The classification accuracy of Adult data set obtained, with education as class variable was 40.08% and the same accuracy was obtained even after transformation. Similarly the number of rules generated for the given confidence 0.9, was the same for both the original and transformed table and equal to 10. Conclusion: The association rules involving categorical sensitive attributes were checked manually for privacy breach. We found that it is not possible to guess the actual sensitive values from the rules, even though there was no information loss. The results can be interpreted only with the concern of data owner or data publisher.

Copyright

© 2009 E. Poovammal and M. Ponnavaikko. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.