skip to main content
10.1145/2024288.2024306acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesi-knowConference Proceedingsconference-collections
research-article

Privacy-aware spam detection in social bookmarking systems

Published:07 September 2011Publication History

ABSTRACT

With the increased popularity of Web 2.0 services in the last years data privacy has become a major concern for users. The more personal data users reveal, the more difficult it becomes to control its disclosure in the web. However, for Web 2.0 service providers, the data provided by users is a valuable source for offering effective, personalised data mining services. One major application is the detection of spam in social bookmarking systems: in order to prevent a decrease of content quality, providers need to distinguish spammers and exclude them from the system. They thereby experience a conflict of interests: on the one hand, they need to identify spammers based on the information they collect about users, on the other hand, they need to respect privacy concerns and process as few personal data as possible. It would therefore be of tremendous help for system developers and users to know which personal data are needed for spam detection and which can be ignored. In this paper we address these questions by presenting a data privacy aware feature engineering approach. It consists of the design of features for spam classification which are evaluated according to both, performance and privacy conditions. Experiments using data from the social bookmarking system BibSonomy show that both conditions must not exclude each other.

References

  1. K. Barker, M. Askari, M. Banerjee, K. Ghazinour, B. Mackas, M. Majedi, S. Pun, and A. Williams. A data privacy taxonomy. In Proc, of the 26th British National Conference on Databases: Dataspace: The Final Frontier, BNCOD 26, pages 42--54, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Bhagat, G. Cormode, B. Krishnamurthy, and D. Srivastava. Class-based graph anonymization for social network data. Proc. VLDB Endow., 2:766--777, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. Al Communications Journal, 20(4):245--262, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Chen, P.-N. Tan, and A. K. Jain. A co-classification framework for detecting web spam and spammers in social media web sites. In D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin, editors, CIKM, pages 1807--1810. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Cornelius and S. Tschoepe. Strafrechtliche Grenzen der zentralen E-Mail-Filterung und -Blockade. Kommunikation und Recht, pages 269--271, 2006.Google ScholarGoogle Scholar
  6. Council of Europe. Convention for the protection of individuals with regard to automatic processing of personal data, January 1981.Google ScholarGoogle Scholar
  7. G. Danezis. Inferring privacy policies for social networking services. In Proc, of the 2nd ACM workshop on Security and artificial intelligence, AlSec '09, pages 5--10, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. V. Eecke and M. Truyens. Privacy and social networks. Computer Law & Security Review, 26(5):535--546, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  9. L. Fang and K. LeFevre. Privacy wizards for social networking sites. In Proc, of the 19th international conference on World wide web, WWW '10, pages 351--360, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett, 27(8):861--874, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Golder and B. A. Huberman. The structure of collaborative tagging systems. Journal of Information Sciences, 32(2):198--208, April 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11:36--45, November 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Hoeren. Intemetrecht, 2010. P. 419 et seq. Available at: http://www.uni-muenster.de/Jura.itm/hoeren/materialien/Skript/Skript\_Internetrecht\_September\y.202010.pdf.Google ScholarGoogle Scholar
  14. A. Hotho, D. Benz, R. Jäschke, and B. Krause, editors. EC ML PKDD Discovery Challenge 2008 (RSDC'08). Workshop at 18th Europ. Conf. on Machine Learning (ECML'08)/11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), 2008.Google ScholarGoogle Scholar
  15. B. Krause, H. Lerch, A. Hotho, A. Roßnagel, and G. Stumme. Datenschutz im Web 2.0 am Beispiel des sozialen Tagging-Systems BibSonomy. Informatik-Spektrum, pages 1--12, 2010.Google ScholarGoogle Scholar
  16. B. Krause, C. Schmitz, A. Hotho, and G. Stumme. The anti-social tagger: detecting spam in social bookmarking systems. In Proc, of the 4th international workshop on Adversarial information retrieval on the web, pages 61--68, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. B. Krishnamurthy and C. E. Wills. Characterizing privacy in online social networks. In Proc, of the first workshop on Online social networks, WOSP '08, pages 37--42, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Leible. Spam oder Nicht-Spam, das ist hier die Frage. Kommunikation und Recht, 11:485--489, 2006.Google ScholarGoogle Scholar
  19. H. Lerch, B. Krause, A. Hotho, A. Rofinagel, and G. Stumme. Social Bookmarking-Systeme --- die unerkannten Datensammler - Ungewollte personenbezogene Datenverabeitung? MultiMedia und Recht, 7:454--458, 2010.Google ScholarGoogle Scholar
  20. B. Markines, C. Cattuto, and F. Menczer. Social spam detection. In D. Fetterly and Z. Gyöngyi, editors, AIRWeb, ACM International Conference Proceeding Series, pages 41--48, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. OLG Frankfurt a.M. Judgement from 16 June 2010, June 2010. 13 U 105/07.Google ScholarGoogle Scholar
  22. C. Prasse. Spam-E-Mails in der neueren Rechtsprechung. Monatsschrift fuer deutsches Recht, 7:361--365, 2006.Google ScholarGoogle Scholar
  23. J. Schrammel, C. Köffel, and M. Tscheligi. How much do you tell?: information disclosure behaviour indifferent types of online communities. In Proc, of the 4th international conference on Communities and technologies, pages 275--284, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Schrammel, C. Köffel, and M. Tscheligi. Personality traits, usage patterns and information disclosure in online communities. In Proc, of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology, pages 169--174, Swinton, UK, 2009. British Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. G. Spindler and S. Ernst. Vertragsgestaltung für den Einsatz von E-Mail-Filtern. Computer Und Recht: Forum für die Praxis des Rechts der Datenverarbeitung, Information und Automation, 20(6):437--444, 2004.Google ScholarGoogle Scholar
  26. T. Stadler. Schutz vor Spam durch Greylisting - Eine rechtsadaequate Handlungsoption? Datenschutz und Datensicherheit, 6:433--438, 2005.Google ScholarGoogle Scholar
  27. The Madrid Resolution. International standards on the protection of personal data and privacy. In 31st International Conference of Data Protection and Privacy Commissioners, volume 2, November 2009.Google ScholarGoogle Scholar
  28. UN General Assembly. Guidelines for the regulation of computerized personal data files. Available at:http://www.unhcr.org/refworld/docid/3ddcafaac.html, December 1990.Google ScholarGoogle Scholar
  29. B. Zhou, J. Pei, and W. Luk. A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor. Newsl., 10:12--22, December 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Privacy-aware spam detection in social bookmarking systems

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            i-KNOW '11: Proceedings of the 11th International Conference on Knowledge Management and Knowledge Technologies
            September 2011
            306 pages
            ISBN:9781450307321
            DOI:10.1145/2024288

            Copyright © 2011 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 7 September 2011

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate77of238submissions,32%

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader