ABSTRACT
The General Data Protection Regulation (GDPR) and other recent privacy laws require organizations to post their privacy policies, and place specific expectations on organisations' privacy practices. Privacy policies take the form of documents written in natural language, and one of the expectations placed upon them is that they remain up to date. To investigate legal compliance with this recency requirement at a large scale, we create a novel pipeline that includes crawling, regex-based extraction, candidate date classification and date object creation to extract updated and effective dates from privacy policies written in English. We then analyze patterns in policy dates using four web crawls and find that only about 40% of privacy policies online contain a date, thereby making it difficult to assess their regulatory compliance. We also find that updates in privacy policies are temporally concentrated around passage of laws regulating digital privacy (such as the GDPR), and that more popular domains are more likely to have policy dates as well as more likely to update their policies regularly.
- Angel X Chang and Christopher D Manning. 2012. Sutime: A library for recognizing and normalizing time expressions.. In Lrec, Vol. 3735. 3740.Google Scholar
- Moses S Charikar. 2002. Similarity estimation techniques from rounding algorithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. ACM, 380--388.Google ScholarDigital Library
- Lorrie Faith Cranor. 2012. Necessary but not sufficient: Standardized mechanisms for privacy notice and choice. J. on Telecomm. & High Tech. L. 10 (2012), 273.Google Scholar
- Martin Degeling, Christine Utz, Christopher Lentzsch, Henry Hosseini, Florian Schaub, and Thorsten Holz. 2018. We value your privacy... now take some cookies: Measuring the GDPR's impact on web privacy. arXiv preprint arXiv:1808.05096 (2018).Google Scholar
- Beata Fonferko-Shadrach, Arron S Lacey, Angus Roberts, Ashley Akbari, Simon Thompson, David V Ford, Ronan A Lyons, Mark I Rees, and William Owen Pickrell. 2019. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system. BMJ open 9, 4 (2019), e023232.Google Scholar
- Julia T Fu, Evan Sholle, Spencer Krichevsky, Joseph Scandura, and Thomas R Campion. 2020. Extracting and classifying diagnosis dates from clinical notes: a case study. Journal of Biomedical Informatics 110 (2020), 103569.Google ScholarDigital Library
- Johanna Fulda, Matthew Brehmer, and Tamara Munzner. 2015. TimeLineCurator: Interactive authoring of visual timelines from unstructured text. IEEE transactions on visualization and computer graphics 22, 1 (2015), 300--309.Google Scholar
- Sonu Gupta, Ellen Poplavska, Nora O'Toole, Siddhant Arora, Thomas Norton, Norman Sadeh, and Shomir Wilson. 2022. Creation and Analysis of an International Corpus of Privacy Laws. arXiv preprint arXiv:2206.14169 (2022).Google Scholar
- Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd. 2020. spaCy: Industrial-strength Natural Language Processing in Python. (2020).Google Scholar
- Thomas Linden, Rishabh Khandelwal, Hamza Harkous, and Kassem Fawaz. 2020. The Privacy Policy Landscape After the GDPR. Proceedings on Privacy Enhancing Technologies 1 (2020), 47--64.Google ScholarCross Ref
- Marco Lui and Timothy Baldwin. 2012. langid. py: An off-the-shelf language identification tool. In Proceedings of the ACL 2012 system demonstrations. Association for Computational Linguistics, 25--30.Google ScholarDigital Library
- Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma. 2007. Detecting near-duplicates for web crawling. In Proceedings of the 16th international conference on World Wide Web. ACM, 141--150.Google ScholarDigital Library
- Anoop D Shah, Carlos Martinez, and Harry Hemingway. 2012. The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records. BMC medical informatics and decision making 12, 1 (2012), 1--13.Google Scholar
- Robert H Sloan and Richard Warner. 2014. Beyond notice and choice: Privacy, norms, and consent. J. High Tech. L. 14 (2014), 370.Google Scholar
- David A Smith. 2002. Detecting events with date and place information in unstructured text. In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries. 191--196.Google ScholarDigital Library
- Mukund Srinath, Soundarya Nurani Sundareswara, C Lee Giles, and Shomir Wilson. 2021. PrivaSeer: A Privacy Policy Search Engine. In International Conference on Web Engineering. Springer, 286--301.Google Scholar
- Mukund Srinath, Shomir Wilson, and C Lee Giles. 2021. Privacy at Scale: Introducing the PrivaSeer Corpus of Web Privacy Policies. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6829--6839.Google ScholarCross Ref
- Soundarya Sundareswara, Shomir Wilson, Mukund Srinath, and Lee Giles. 2020. Privacy not found: a study of the availability of privacy policies on the web.Google Scholar
- Soundarya Nurani Sundareswara, Mukund Srinath, Shomir Wilson, and C. Lee Giles. 2021. A Large-Scale Exploration of Terms of Service Documents on the Web. In Proceedings of the 21st ACM Symposium on Document Engineering (Limerick, Ireland) (DocEng '21). Association for Computing Machinery, New York, NY, USA, Article 21, 4 pages. https://doi.org/10.1145/3469096.3474940Google ScholarDigital Library
Index Terms
- Privacy Now or Never: Large-Scale Extraction and Analysis of Dates in Privacy Policy Text
Recommendations
Privacy Lost and Found: An Investigation at Scale of Web Privacy Policy Availability
DocEng '23: Proceedings of the ACM Symposium on Document Engineering 2023Legal jurisdictions around the world require organisations to post privacy policies on their websites. However, in spite of laws such as GDPR and CCPA reinforcing this requirement, organisations sometimes do not comply, and a variety of semi-compliant ...
PriPoCoG: Guiding Policy Authors to Define GDPR-Compliant Privacy Policies
Trust, Privacy and Security in Digital BusinessAbstractThe General Data Protection Regulation (GDPR) makes the creation of compliant privacy policies a complex process. Our goal is to support policy authors during the creation of privacy policies, by providing them feedback on the privacy policy they ...
Enterprise privacy promises and enforcement
WITS '05: Proceedings of the 2005 workshop on Issues in the theory of securitySeveral formal languages have been proposed to encode privacy policies, ranging from the Platform for Privacy Preferences (P3P), intended for communicating privacy policies to consumers over the web, to the Enterprise Privacy Authorization Language (...
Comments