skip to main content
10.1145/1451983.1451993acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesiea-aeiConference Proceedingsconference-collections
research-article

Analysing features of Japanese splogs and characteristics of keywords

Published:22 April 2008Publication History

ABSTRACT

This paper focuses on analyzing (Japanese) splogs based on various characteristics of keywords contained in them. We estimate the behavior of spammers when creating splogs from other sources by analyzing the characteristics of keywords contained in splogs. Since splogs often cause noises in word occurrence statistics in the blogosphere, we assume that we can efficiently (manually) collect splogs by sampling blog homepages containing keywords of a certain type on the date with its most frequent occurrence. We manually examine various features of collected blog homepages regarding whether their text content is excerpt from other sources or not, as well as whether they display affiliate advertisement or out-going links to affiliated sites. Among various informative results, it is important to note that more than half of the collected splogs are created by a very small number of spammers.

References

  1. Wikipedia, Spam blog. http://en.wikipedia.org/wiki/Spam_blog.Google ScholarGoogle Scholar
  2. Wikipedia, Word salad (computer science). http://en.wikipedia.org/wiki/Word_salad_%28computer_science%29.Google ScholarGoogle Scholar
  3. T. Fukuhara, T. Murayama, and T. Nishida. Analyzing concerns of people using Weblog articles and real world temporal data. In Proceedings of WWW 2005 2nd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2005.Google ScholarGoogle Scholar
  4. T. Fukuhara, H. Nakagawa, and T. Nishida. Understanding sentiment of people from news articles: Temporal sentiment analysis of social events. In Proceedings of ICWSM, pages 271--272, 2007.Google ScholarGoogle Scholar
  5. T. Fukuhara, T. Utsuro, and H. Nakagawa. Cross-lingual concern analysis from multilingual weblog articles. In A. Nijholt, O. Stock, and T. Nishida, editors, Proceedings of the 6th International Workshop on Social Intelligence Design, pages 55--64, 2007.Google ScholarGoogle Scholar
  6. N. Glance, M. Hurst, and T. Tomokiyo. Blogpulse: Automated trend discovery for Weblogs. In WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2004.Google ScholarGoogle Scholar
  7. Z. Gyöngyi and H. Garcia-Molina. Web spam taxonomy. In Proc. 1st AIRWeb, pages 39--47, 2005.Google ScholarGoogle Scholar
  8. P. Kolari, T. Finin, and A. Joshi. SVMs for the Blogosphere: Blog identification and Splog detection. In Proceedings of the 2006 AAAI Spring Symposium on Computational Approaches to Analyzing Weblogs, pages 92--99, 2006.Google ScholarGoogle Scholar
  9. P. Kolari, T. Finin, and A. Joshi. Spam in blogs and social media. In Tutorial at ICWSM, 2007.Google ScholarGoogle Scholar
  10. P. Kolari, A. Joshi, and T. Finin. Characterizing the splogosphere. In Proceedings of WWW 2006 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2006.Google ScholarGoogle Scholar
  11. Y.-R. Lin, H. Sundaram, Y. Chi, J. Tatemura, and B. L. Tseng. Splog detection using self-similarity analysis on blog temporal dynamics. In Proc. 3rd AIRWeb, pages 1--8, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Macdonald and I. Ounis. The TREC Blogs06 collection: Creating and analysing a blog test collection. Technical Report TR-2006-224, University of Glasgow, Department of Computing Science, 2006.Google ScholarGoogle Scholar
  13. T. Nanno, T. Fujiki, Y. Suzuki, and M. Okumura. Automatically collecting, monitoring, and mining Japanese weblogs. In WWW Alt. '04: Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters, pages 320--321. ACM Press, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Sato, T. Utsuro, T. Fukuhara, Y. Kawada, Y. Murakami, H. Nakagawa, and N. Kando. Collecting and analyzing Japanese splogs based on characteristics of keywords. In Proc. ICWSM, pages 218--219, 2008.Google ScholarGoogle Scholar
  15. T. Urvoy, T. Lavergne, and P. Filoche. Tracking Web spam with hidden style similarity. In Proc. 2nd AIRWeb, pages 25--30, 2006.Google ScholarGoogle Scholar
  16. Y. Wang, M. Ma, Y. Niu, and H. Chen. Spam double-funnel: Connecting web spammers with advertisers,. In Proc. 16th WWW Conf., pages 291--300, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Analysing features of Japanese splogs and characteristics of keywords

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        AIRWeb '08: Proceedings of the 4th international workshop on Adversarial information retrieval on the web
        April 2008
        81 pages
        ISBN:9781605581590
        DOI:10.1145/1451983

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 22 April 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader