ABSTRACT
A privacy policy is a legal document, used by websites to communicate how the personal data that they collect will be managed. By accepting it, the user agrees to release his data under the conditions stated by the policy. Privacy policies should provide enough information to enable users to make informed decisions. Privacy regulations support this by specifying what kind of information has to be provided. As privacy policies can be long and difficult to understand, users tend not to read them. Because of this, users generally agree with a policy without knowing what it states and whether aspects important to him are covered at all. In this paper we present a solution to assist the user by providing a structured way to browse the policy content and by automatically assessing the completeness of a policy, i.e. the degree of coverage of privacy categories important to the user. The privacy categories are extracted from privacy regulations, while text categorization and machine learning techniques are used to verify which categories are covered by a policy. The results show the feasibility of our approach; an automatic classifier, able to associate the right category to paragraphs of a policy with an accuracy approximating that obtainable by a human judge, can be effectively created.
- P. Ashley, S. Hada, G. Karjoth, C. Powers, and M. Schunter. Enterprise privacy authorization language (EPAL). Technical report, IBM Research, 2003.Google Scholar
- P. Beatty, I. Reay, S. Dick, and J. Miller. P3P adoption on E-Commerce Web sites: A Survey and Analysis. IEEE Internet Computing, 11(2), 2007. Google ScholarDigital Library
- L. Breiman. Classification and regression trees. Wadsworth International Group, 1984.Google Scholar
- C. Brodie, C. Karat, J. Karat, and J. Feng. Usable security and privacy: a case study of developing privacy management tools. In Proc. of SOUPS, 2005. Google ScholarDigital Library
- G. Brown. Ensemble learning. Encyclopedia of Machine Learning, 2010.Google Scholar
- E. Costante, J. d. Hartog, and M. Petkovic. On-line Trust Perception: What Really Matters. In Proc. of STAST, 2011.Google ScholarCross Ref
- E. Costante, J. d. Hartog, and M. Petkovic. What Websites Know About You. In Proc. of DPM, 2012.Google Scholar
- L. Cranor, M. Langheinrich, M. Marchiori, M. Presler-Marshall, and J. Reagle. The platform for privacy preferences 1.0 (P3P1. 0) specification. W3C, 2002.Google Scholar
- H. Farrell. Constructing the International Foundations of E-Commerce - The EU-U.S. Safe Harbor Arrangement. International Organization, 57(02), 2003.Google Scholar
- S. Kotsiantis and D. Kanellopoulos. Data preprocessing for supervised leaning. IJCSI, 1(2), 2006.Google Scholar
- L. Kotthoff, I. Gent, and I. Miguel. A Preliminary Evaluation of Machine Learning in Algorithm Selection for Search Problems. In Proc. of SoCS, 2011.Google Scholar
- H. Liu and R. Setiono. Chi2: feature selection and discretization of numeric attributes. In Proc. of ICTAI, 1995. Google ScholarDigital Library
- OASIS. eXtensible Access Control Markup Language (XACML) version 2.0. Technical report, OASIS, 2008.Google Scholar
- R. Polikar. Ensemble based systems in decision making. Circuits and Systems Magazine, IEEE, 2006.Google ScholarCross Ref
- P. Refaeilzadeh, L. Tang, and H. Liu. Cross-validation. In Encyclopedia of Database Systems. Springer, 2009. Google ScholarDigital Library
- G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information processing & management, 24(5), 1988. Google ScholarDigital Library
- F. Sebastiani. Machine learning in automated text categorization. ACM Computing Surveys, 34(1), 2002. Google ScholarDigital Library
- G. Sigletos, G. Paliouras, and C. D. Spyropoulos. Combining Information Extraction Systems Using Voting and Stacked Generalization. JMLR, 6, 2005. Google ScholarDigital Library
- W3C. Privacy Enhancing Browser Extensions. Technical report, W3C, 2011.Google Scholar
- D. S. Wilks. Statistical forecasting. In International Geophysics, chapter 7. Academic Press, 2011.Google Scholar
- D. Wolpert. Stacked generalization. Neural networks, 5(2), 1992. Google ScholarDigital Library
- W. Yu, S. Doddapaneni, and S. Murthy. A Privacy Assessment Approach for Serviced Oriented Architecture Application. In Proc. of SOSE, 2006. Google ScholarDigital Library
Index Terms
A machine learning solution to assess privacy policy completeness: (short paper)
Recommendations
Finding a Choice in a Haystack: Automatic Extraction of Opt-Out Statements from Privacy Policy Text
WWW '20: Proceedings of The Web Conference 2020Website privacy policies sometimes provide users the option to opt-out of certain collections and uses of their personal data. Unfortunately, many privacy policies bury these instructions deep in their text, and few web users have the time or skill ...
User interfaces for privacy agents
Most people do not often read privacy policies because they tend to be long and difficult to understand. The Platform for Privacy Preferences (P3P) addresses this problem by providing a standard machine-readable format for website privacy policies. P3P ...
Analyzing GDPR Compliance Through the Lens of Privacy Policy
Heterogeneous Data Management, Polystores, and Analytics for HealthcareAbstractWith the arrival of the European Union’s General Data Protection Regulation (GDPR), several companies are making significant changes to their systems to achieve compliance. The changes range from modifying privacy policies to redesigning systems ...
Comments