Abstract
A new inference control, called random sample queries, is proposed for safeguarding confidential data in on-line statistical databases. The random sample queries control deals directly with the basic principle of compromise by making it impossible for a questioner to control precisely the formation of query sets. Queries for relative frequencies and averages are computed using random samples drawn from the query sets. The sampling strategy permits the release of accurate and timely statistics and can be implemented at very low cost. Analysis shows the relative error in the statistics decreases as the query set size increases; in contrast, the effort required to compromise increases with the query set size due to large absolute errors. Experiments performed on a simulated database support the analysis.
- 1 ACHUGBUE, J. O., AND CHIN, F.Y. Output perturbation for protection of statistical data bases. Dep. Computing Science, Univ. Alberta, Alberta, Canada, Jan. 1978.Google Scholar
- 2 BECK, L.L. A security mechanism for statistical databases. A CM Trans. Database Syst. 5, 3 (Sept. 1980), 316-338. Google ScholarDigital Library
- 3 BORUCH, R.F. Maintaining confidentiality in educational research: A systematic analysis. Am. Psychol. 26 (1971), 413-430.Google ScholarCross Ref
- 4 CAMPBELL, D. T., BORUCH, R. F., SCHWARTZ, R. D., AND STEINBERG, J. Confidentialitypreserving modes of access to files and to interfile exchange for useful statistical analysis. Eval. Quart. 1, 2 (May 1977), 269-299.Google Scholar
- 5 CHIN, F.Y. Security in statistical databases for queries with small counts. ACM Trans. Database Syst. 3, 1 (March 1978), 92-104. Google ScholarDigital Library
- 6 DALENIUS, T. Towards a methodology for statistical disclosure control. Sdrtryck ur Statistisk tidskrift 15 (1977), 429-444.Google Scholar
- 7 DALENIUS, T., AND REISS, S.P. Data-swapping--A technique for disclosure control. Confidentiality in Surveys, Rep. 31, Dep. Star., Univ. Stockholm, Stockholm, Sweden, May 1978.Google Scholar
- 8 DAVIDA, G. I., ET AL. Data base security. IEEE Trans. Softw. Eng. SE-4, 6 (Nov. 1978), 531- 533.Google ScholarDigital Library
- 9 DEMILLO, R. A., DOBKXN, D., AND LIPTON, R.J. Even data bases that lie can be compromised. IEEE Trans. Softw. Eng. SE-4, 1 (Jan. 1978), 73-75.Google ScholarDigital Library
- 10 DENNINg, D.E. A review of research on statistical database security. In Foundations of Secure Computation, R. A. DeMillo et al., Eds. Academic, New York, 1978.Google Scholar
- 11 DF~NNING, D.E. Are statistical data bases secure? Proc. AFIPS 1978 NCC, vol. 47, AFIPS Press, Arlington, Va., pp. 525-530.Google Scholar
- 12 DENNING, D. E., AND DENNING, P.J. Data security. Comput. Surv. 11, 3 (Sept. I979), 227-249. Google ScholarDigital Library
- 13 DENNING, D. E., DENNING, P. J., AND SCHWARTZ, M.D. The tracker: A threat to statistical database security. ACM Trans. Database Syst. 4, 1 (March 1979), 76-96. Google ScholarDigital Library
- 14 DENNING, D. E., AND SCHLORER, J. A fast procedure for finding a tracker in a statistical database. ACM Trans. Database Syst. 5, 1 (March 1980), 88-102. Google ScholarDigital Library
- 15 DENNING, D.E. Complexity results relating to statistical confidentiality. Computer Science and Statistics: 12th Ann. Symp. Interface, Waterloo, Canada, May 1979, pp. 252-256.Google Scholar
- 16 DOBKIN, D., JONES, A. K., AND LIPTON, R.J. Secure databases: Protection against user influence. ACM Trans. Database Syst. 4, 1 (March 1979), 97-I06. Google ScholarDigital Library
- 17 FE{GE, E. L., AND WATTS, H. W. Protection of privacy through microaggregation. In Data Bases, Computers, and the Social Sciences, R. L. Bisco, Ed. Wiley-Interscience, New York, 1970.Google Scholar
- 18 FELLER, W. An Introduction to Probability Theory and Its Applications L Wiley, New York, I950.Google Scholar
- 19 FELLEGI, I. P., AND PHILLIPS, J.L. Statistical confidentiality: Some theory and applications to data dissemination. Ann. Econ. Soc. MeaN. 3, 2 (April 1974), 399-409.Google Scholar
- 20 HANSEN, M.H. Insuring confidentiality of individual records in data storage and retrieval for statistical purposes. Proc. AFIPS 1971 FJCC, vol. 39, AFIPS Press, Arlington, Va., pp. 579-585.Google Scholar
- 21 HAQ, M.I. On safeguarding statistical disclosure by giving approximate answers to queries. Int. Computing Symp., 1977, pp. 491-495.Google Scholar
- 22 HOFFMAN, L. J., AND MILLER, W.F. Getting a personal dossier from a statistical data bank. Datamation 16, 5 (May 1970), 74-75.Google Scholar
- 23 KAM, J. B., AND ULLMAN, J.D. A model of statistical databases and their security. ACM Trans. Database Syst. 2, 1 (March 1977), 1-10. Google ScholarDigital Library
- 24 KARPINSKI, R.H. Reply to Hoffman and Shaw. Datamation 16, I0 {Oct. 1970), 11.Google Scholar
- 25 NARGUNDKAR, M. S., AND SAVELAND, W. Random rounding to prevent statistical disclosure. Proc. Am. Stat. Assoc., Soc. Stat. Sect. (1972), 382-385.Google Scholar
- 26 NATIONAL BUREAU OF STANDARDS. Data encryption standard. PIPS PUB. 46, Washington, D.C., Jan. 1977.Google Scholar
- 27 REED, I.S. Information theory and privacy in data banks. Proc. AFIPS 1973, vol. 42, AFIPS Press, Arlington, Va., pp. 581-587.Google ScholarDigital Library
- 28 REINS, S.B. Medians and database security. In Foundations of Secure Computation, R. A. DeMillo et al., Eds. Academic, New York, 1978.Google Scholar
- 29 SCHLORER, J. Identification and retrieval of personal records from a statistical data bank. Methods Inform. Med. 14, 1 (Jan. 1975), 7-13.Google ScholarCross Ref
- 30 SCHLORER, J. Confidentiality and security in statistical data banks. In Data Documentation: Some Principles and Applications in Science and Industry, W. Guas and R. Henzler, Eds. Proc. Workshop Data Documentation, 1975, Verl. Dok., Munchen, 1977, pp. 101-123.Google Scholar
- 31 SCHL6REI~, J. Disclosure from statistical databases: Quantitative aspects of trackers. Inst. Medizinische Statistik und Dokumentation, Univ. Giessen, Giessen, W. Germany, Mar. 1979. To appear in A CM Trans. Database Syst.Google Scholar
- 32 SCHL6RER, J. Security of statistical databases: Multidimensional transformation. Rep. TB- IMSD 2/78, Inst. Medizinische Statistik und Dokumentation, Univ. Giessen, Giessen, W. Germany, Mar. 1979.Google Scholar
- 33 SCHL6RER, J. Statistical database security: Some recent results. Inst. Medizinische Statistik und Dokumentation, Univ. Giessen, Giessen, W. Germany, 1979. Presented at Medical Informatics, Berlin, 1979.Google Scholar
- 34 SCHWARTZ, M. D., DENNING, D. E., AND DENNING, P.J. Securing data bases under linear queries. Proc. IFIP Congress 77, North-Holland, Amsterdam, 1977, pp. 395-398.Google Scholar
- 35 SCHWARTZ, M. D. Inference from statistical data bases. Ph.D. Dissertation, Dep. Computer Sciences, Purdue Univ., W. Lafayette, Ind., Aug. 1977. Google ScholarDigital Library
- 36 SCHWARTZ, M. D., DENNING, D. E., AND DENNING, P.j. Linear queries in statistical databases. ACM Trans. Database Syst. 4, 2 (June 1979), 156-167. Google ScholarDigital Library
- 37 Yu, C. T., AND CHIN, F.Y. A study on the protection of statistical data bases. ACM SIGMOD Int. Conf. Management of Data, 1977, pp. i69-181. Google ScholarDigital Library
Index Terms
- Secure statistical databases with random sample queries
Recommendations
The tracker: a threat to statistical database security
The query programs of certain databases report raw statistics for query sets, which are groups of records specified implicitly by a characteristic formula. The raw statistics include query set size and sums of powers of values in the query set. Many ...
Security of statistical databases: multidimensional transformation
The concept of multidimensional transformation of statistical databases is described. A given set of statistical output may be compatible with more than one statistical database. A transformed database D' is a database which (1) differs from the ...
A fast procedure for finding a tracker in a statistical database
To avoid trivial compromises, most on-line statistical databases refuse to answer queries for statistics about small subgroups. Previous research discovered a powerful snooping tool, the tracker, with which the answers to these unanswerable queries are ...
Comments