ABSTRACT
With the increased popularity of Web 2.0 services in the last years data privacy has become a major concern for users. The more personal data users reveal, the more difficult it becomes to control its disclosure in the web. However, for Web 2.0 service providers, the data provided by users is a valuable source for offering effective, personalised data mining services. One major application is the detection of spam in social bookmarking systems: in order to prevent a decrease of content quality, providers need to distinguish spammers and exclude them from the system. They thereby experience a conflict of interests: on the one hand, they need to identify spammers based on the information they collect about users, on the other hand, they need to respect privacy concerns and process as few personal data as possible. It would therefore be of tremendous help for system developers and users to know which personal data are needed for spam detection and which can be ignored. In this paper we address these questions by presenting a data privacy aware feature engineering approach. It consists of the design of features for spam classification which are evaluated according to both, performance and privacy conditions. Experiments using data from the social bookmarking system BibSonomy show that both conditions must not exclude each other.
- K. Barker, M. Askari, M. Banerjee, K. Ghazinour, B. Mackas, M. Majedi, S. Pun, and A. Williams. A data privacy taxonomy. In Proc, of the 26th British National Conference on Databases: Dataspace: The Final Frontier, BNCOD 26, pages 42--54, Berlin, Heidelberg, 2009. Springer-Verlag. Google ScholarDigital Library
- S. Bhagat, G. Cormode, B. Krishnamurthy, and D. Srivastava. Class-based graph anonymization for social network data. Proc. VLDB Endow., 2:766--777, August 2009. Google ScholarDigital Library
- C. Cattuto, C. Schmitz, A. Baldassarri, V. D. P. Servedio, V. Loreto, A. Hotho, M. Grahl, and G. Stumme. Network properties of folksonomies. Al Communications Journal, 20(4):245--262, 2007. Google ScholarDigital Library
- F. Chen, P.-N. Tan, and A. K. Jain. A co-classification framework for detecting web spam and spammers in social media web sites. In D. W.-L. Cheung, I.-Y. Song, W. W. Chu, X. Hu, and J. J. Lin, editors, CIKM, pages 1807--1810. ACM, 2009. Google ScholarDigital Library
- K. Cornelius and S. Tschoepe. Strafrechtliche Grenzen der zentralen E-Mail-Filterung und -Blockade. Kommunikation und Recht, pages 269--271, 2006.Google Scholar
- Council of Europe. Convention for the protection of individuals with regard to automatic processing of personal data, January 1981.Google Scholar
- G. Danezis. Inferring privacy policies for social networking services. In Proc, of the 2nd ACM workshop on Security and artificial intelligence, AlSec '09, pages 5--10, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- P. V. Eecke and M. Truyens. Privacy and social networks. Computer Law & Security Review, 26(5):535--546, 2010.Google ScholarCross Ref
- L. Fang and K. LeFevre. Privacy wizards for social networking sites. In Proc, of the 19th international conference on World wide web, WWW '10, pages 351--360, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
- T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett, 27(8):861--874, 2006. Google ScholarDigital Library
- S. Golder and B. A. Huberman. The structure of collaborative tagging systems. Journal of Information Sciences, 32(2):198--208, April 2006. Google ScholarDigital Library
- P. Heymann, G. Koutrika, and H. Garcia-Molina. Fighting spam on social web sites: A survey of approaches and future challenges. IEEE Internet Computing, 11:36--45, November 2007. Google ScholarDigital Library
- T. Hoeren. Intemetrecht, 2010. P. 419 et seq. Available at: http://www.uni-muenster.de/Jura.itm/hoeren/materialien/Skript/Skript\_Internetrecht\_September\y.202010.pdf.Google Scholar
- A. Hotho, D. Benz, R. Jäschke, and B. Krause, editors. EC ML PKDD Discovery Challenge 2008 (RSDC'08). Workshop at 18th Europ. Conf. on Machine Learning (ECML'08)/11th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'08), 2008.Google Scholar
- B. Krause, H. Lerch, A. Hotho, A. Roßnagel, and G. Stumme. Datenschutz im Web 2.0 am Beispiel des sozialen Tagging-Systems BibSonomy. Informatik-Spektrum, pages 1--12, 2010.Google Scholar
- B. Krause, C. Schmitz, A. Hotho, and G. Stumme. The anti-social tagger: detecting spam in social bookmarking systems. In Proc, of the 4th international workshop on Adversarial information retrieval on the web, pages 61--68, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- B. Krishnamurthy and C. E. Wills. Characterizing privacy in online social networks. In Proc, of the first workshop on Online social networks, WOSP '08, pages 37--42, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- S. Leible. Spam oder Nicht-Spam, das ist hier die Frage. Kommunikation und Recht, 11:485--489, 2006.Google Scholar
- H. Lerch, B. Krause, A. Hotho, A. Rofinagel, and G. Stumme. Social Bookmarking-Systeme --- die unerkannten Datensammler - Ungewollte personenbezogene Datenverabeitung? MultiMedia und Recht, 7:454--458, 2010.Google Scholar
- B. Markines, C. Cattuto, and F. Menczer. Social spam detection. In D. Fetterly and Z. Gyöngyi, editors, AIRWeb, ACM International Conference Proceeding Series, pages 41--48, 2009. Google ScholarDigital Library
- OLG Frankfurt a.M. Judgement from 16 June 2010, June 2010. 13 U 105/07.Google Scholar
- C. Prasse. Spam-E-Mails in der neueren Rechtsprechung. Monatsschrift fuer deutsches Recht, 7:361--365, 2006.Google Scholar
- J. Schrammel, C. Köffel, and M. Tscheligi. How much do you tell?: information disclosure behaviour indifferent types of online communities. In Proc, of the 4th international conference on Communities and technologies, pages 275--284, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- J. Schrammel, C. Köffel, and M. Tscheligi. Personality traits, usage patterns and information disclosure in online communities. In Proc, of the 23rd British HCI Group Annual Conference on People and Computers: Celebrating People and Technology, pages 169--174, Swinton, UK, 2009. British Computer Society. Google ScholarDigital Library
- G. Spindler and S. Ernst. Vertragsgestaltung für den Einsatz von E-Mail-Filtern. Computer Und Recht: Forum für die Praxis des Rechts der Datenverarbeitung, Information und Automation, 20(6):437--444, 2004.Google Scholar
- T. Stadler. Schutz vor Spam durch Greylisting - Eine rechtsadaequate Handlungsoption? Datenschutz und Datensicherheit, 6:433--438, 2005.Google Scholar
- The Madrid Resolution. International standards on the protection of personal data and privacy. In 31st International Conference of Data Protection and Privacy Commissioners, volume 2, November 2009.Google Scholar
- UN General Assembly. Guidelines for the regulation of computerized personal data files. Available at:http://www.unhcr.org/refworld/docid/3ddcafaac.html, December 1990.Google Scholar
- B. Zhou, J. Pei, and W. Luk. A brief survey on anonymization techniques for privacy preserving publishing of social network data. SIGKDD Explor. Newsl., 10:12--22, December 2008. Google ScholarDigital Library
Index Terms
- Privacy-aware spam detection in social bookmarking systems
Recommendations
Post-Level Spam Detection for Social Bookmarking Web Sites
ASONAM '11: Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and MiningSocial book marking Web sites have emerged recently for collecting and sharing of interesting Web sites among users. People can add Web pages to such sites as bookmarks and allow themselves as well as others to manipulate them. One of the key features ...
Personalized privacy-preserving social recommendation
AAAI'18/IAAI'18/EAAI'18: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial IntelligencePrivacy leakage is an important issue for social recommendation. Existing privacy preserving social recommendation approaches usually allow the recommender to fully control users' information. This may be problematic since the recommender itself may be ...
Spam Detection: Technologies for spam detection
The underlying problem with spam detection is how to define spam. Simon Heron of Network Box examines current techniques for defining and detecting spam and how spamming itself has evolved in order to avoid detection. From early whitelisting and ...
Comments