skip to main content
10.1145/2381966.2381979acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

A machine learning solution to assess privacy policy completeness: (short paper)

Published:15 October 2012Publication History

ABSTRACT

A privacy policy is a legal document, used by websites to communicate how the personal data that they collect will be managed. By accepting it, the user agrees to release his data under the conditions stated by the policy. Privacy policies should provide enough information to enable users to make informed decisions. Privacy regulations support this by specifying what kind of information has to be provided. As privacy policies can be long and difficult to understand, users tend not to read them. Because of this, users generally agree with a policy without knowing what it states and whether aspects important to him are covered at all. In this paper we present a solution to assist the user by providing a structured way to browse the policy content and by automatically assessing the completeness of a policy, i.e. the degree of coverage of privacy categories important to the user. The privacy categories are extracted from privacy regulations, while text categorization and machine learning techniques are used to verify which categories are covered by a policy. The results show the feasibility of our approach; an automatic classifier, able to associate the right category to paragraphs of a policy with an accuracy approximating that obtainable by a human judge, can be effectively created.

References

  1. P. Ashley, S. Hada, G. Karjoth, C. Powers, and M. Schunter. Enterprise privacy authorization language (EPAL). Technical report, IBM Research, 2003.Google ScholarGoogle Scholar
  2. P. Beatty, I. Reay, S. Dick, and J. Miller. P3P adoption on E-Commerce Web sites: A Survey and Analysis. IEEE Internet Computing, 11(2), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Breiman. Classification and regression trees. Wadsworth International Group, 1984.Google ScholarGoogle Scholar
  4. C. Brodie, C. Karat, J. Karat, and J. Feng. Usable security and privacy: a case study of developing privacy management tools. In Proc. of SOUPS, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. G. Brown. Ensemble learning. Encyclopedia of Machine Learning, 2010.Google ScholarGoogle Scholar
  6. E. Costante, J. d. Hartog, and M. Petkovic. On-line Trust Perception: What Really Matters. In Proc. of STAST, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  7. E. Costante, J. d. Hartog, and M. Petkovic. What Websites Know About You. In Proc. of DPM, 2012.Google ScholarGoogle Scholar
  8. L. Cranor, M. Langheinrich, M. Marchiori, M. Presler-Marshall, and J. Reagle. The platform for privacy preferences 1.0 (P3P1. 0) specification. W3C, 2002.Google ScholarGoogle Scholar
  9. H. Farrell. Constructing the International Foundations of E-Commerce - The EU-U.S. Safe Harbor Arrangement. International Organization, 57(02), 2003.Google ScholarGoogle Scholar
  10. S. Kotsiantis and D. Kanellopoulos. Data preprocessing for supervised leaning. IJCSI, 1(2), 2006.Google ScholarGoogle Scholar
  11. L. Kotthoff, I. Gent, and I. Miguel. A Preliminary Evaluation of Machine Learning in Algorithm Selection for Search Problems. In Proc. of SoCS, 2011.Google ScholarGoogle Scholar
  12. H. Liu and R. Setiono. Chi2: feature selection and discretization of numeric attributes. In Proc. of ICTAI, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. OASIS. eXtensible Access Control Markup Language (XACML) version 2.0. Technical report, OASIS, 2008.Google ScholarGoogle Scholar
  14. R. Polikar. Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 2006.Google ScholarGoogle ScholarCross RefCross Ref
  15. P. Refaeilzadeh, L. Tang, and H. Liu. Cross-validation. In Encyclopedia of Database Systems. Springer, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Sigletos, G. Paliouras, and C. D. Spyropoulos. Combining Information Extraction Systems Using Voting and Stacked Generalization. JMLR, 6, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. W3C. Privacy Enhancing Browser Extensions. Technical report, W3C, 2011.Google ScholarGoogle Scholar
  20. D. S. Wilks. Statistical forecasting. In International Geophysics, chapter 7. Academic Press, 2011.Google ScholarGoogle Scholar
  21. D. Wolpert. Stacked generalization. Neural networks, 5(2), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Yu, S. Doddapaneni, and S. Murthy. A Privacy Assessment Approach for Serviced Oriented Architecture Application. In Proc. of SOSE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A machine learning solution to assess privacy policy completeness: (short paper)

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WPES '12: Proceedings of the 2012 ACM workshop on Privacy in the electronic society
          October 2012
          150 pages
          ISBN:9781450316637
          DOI:10.1145/2381966
          • General Chair:
          • Ting Yu,
          • Program Chair:
          • Nikita Borisov

          Copyright © 2012 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 15 October 2012

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate106of355submissions,30%

          Upcoming Conference

          CCS '24
          ACM SIGSAC Conference on Computer and Communications Security
          October 14 - 18, 2024
          Salt Lake City , UT , USA

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader