Randomization of Statistical Queries of Type Median: A Simulation Approach
Jose Daniel Velazco, Mohammed Awad and Ernst L. Leiss
DOI : 10.3844/jcssp.2018.67.80
Journal of Computer Science
Volume 14, Issue 1
Researcher and third party access to data pertaining to individuals is becoming the norm. The conclusions drawn from such data can be extremely beneficial. However, data owners must maintain the secrecy of the sensitive data fields and make sure it is protected against inference attacks. There are several techniques and restrictions that can be made on queries to prevent adversaries from inferring and identifying sensitive data related to specific individuals. One of the proposed techniques to prevent the disclosure of private data is randomization. In this study, we demonstrate and analyze the implementation of randomization in statistical queries of the selector function median and the results of an extensive simulation. The randomization technique yields a possibly erroneous yet usually reasonably accurate response to every query. In addition, the inference procedure is explained and potential modifications to counter the randomization technique are analyzed and tested against it. We show that, despite these modifications, randomization protects the data by adding uncertainties into the inference procedure, thus, maintaining differential privacy. The results of an extensive simulation testing the various parameters of the randomization technique on randomly generated databases are shown and explained.
© 2018 Jose Daniel Velazco, Mohammed Awad and Ernst L. Leiss. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.