skip to main content
10.1145/1341531.1341560acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Opinion spam and analysis

Published:11 February 2008Publication History

ABSTRACT

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research has been focused on classification and summarization of opinions using natural language processing and data mining techniques. An important issue that has been neglected so far is opinion spam or trustworthiness of online opinions. In this paper, we study this issue in the context of product reviews, which are opinion rich and are widely used by consumers and product manufacturers. In the past two years, several startup companies also appeared which aggregate opinions from product reviews. It is thus high time to study spam in reviews. To the best of our knowledge, there is still no published study on this topic, although Web spam and email spam have been investigated extensively. We will see that opinion spam is quite different from Web spam and email spam, and thus requires different detection techniques. Based on the analysis of 5.8 million reviews and 2.14 million reviewers from amazon.com, we show that opinion spam in reviews is widespread. This paper analyzes such spam activities and presents some novel techniques to detect them

References

  1. E. Amitay, D. Carmel, A. Darlow, R. Lempel & A. Soffer. The connectivity sonar: detecting site functionality by structural patterns. Hypertext'03, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Andreolini, A. Bulgarelli, M. Colajanni & F. Mazzoni. Honeyspam: Honeypots fighting spam at the source. In Proc. USENIX SRUTI 2005, Cambridge, MA, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Baeza-Yates, C. Castillo & V. Lopez. PageRank increase under different collusion topologies. AIRWeb'05, 2005.Google ScholarGoogle Scholar
  4. A. Z. Broder. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences 1997, IEEE Computer Society, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, S. Vigna. A reference collection for web spam, SIGIR Forum'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Chakrabarti. Mining the Web: discovering knowledge from hypertext data. Morgan Kaufmann, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. K. Dave, S. Lawrence & D. Pennock. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. WWW'2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. I. Fette, N. Sadeh-Koniecpol, A. Tomasic. Learning to Detect Phishing Emails. WWW2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. Fetterly, M. Manasse & M. Najork. Detecting phrase-level duplication on the World Wide Web. SIGIR'2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Z. Gyongyi & H. Garcia-Molina. Web Spam Taxonomy. Technical Report, Stanford University, 2004.Google ScholarGoogle Scholar
  11. M. R. Henzinger: Finding near-duplicate web pages: a large-scale evaluation of algorithms. SIGIR'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Hu & B. Liu. Mining and summarizing customer reviews. KDD'2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. N. Jindal and B. Liu. Product Review Analysis. Technical Report, UIC, 2007.Google ScholarGoogle Scholar
  14. N. Jindal and B. Liu. Analyzing and Detecting Review Spam. ICDM2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. W. Li, N. Zhong, C. Liu. Combining Multiple Email Filters Based on Multivariate Statistical Analysis. ISMIS 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Liu. Web Data Mining: Exploring hyperlinks, contents and usage data. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Metwally, D. Agrawal, A. Abbadi. DETECTIVES: DETEcting Coalition hiT Inflation attacks in adVertising nEtworks Streams. WWW2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Mobasher, R. Burke & J. J Sandvig. Model-based collaborative filtering as a defense against profile injection attacks. AAAI'2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. A. Ntoulas, M. Najork, M. Manasse & D. Fetterly. Detecting Spam Web Pages through Content Analysis. WWW'2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. B. Pang, L. Lee & S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. EMNLP'2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A-M. Popescu and O. Etzioni. Extracting Product Features and Opinions from Reviews. EMNLP'2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Sahami and S. Dumais and D. Heckerman and E. Horvitz. A Bayesian Approach to Filtering Junk {E}-Mail. AAAI Technical Report WS-98-05, 1998.Google ScholarGoogle Scholar
  23. P. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. ACL'2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Y. Wang, M. Ma, Y. Niu, H. Chen. Spam Double-Funnel: Connecting Web Spammers with Advertisers. WWW2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Wu and B. D. Davison. Identifying link farm spam pages. WWW'06, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. B. Wu, V. Goel & B. D. Davison. Topical TrustRank: using topicality to combat Web spam. WWW'2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Ye, R. Song, J.-R. Wen, W.-Y. Ma. A Query-dependent duplicate detection approach for large scale search engines. APWeb'04, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  28. Z. Zhang & B. Varadarajan, Utility scoring of product reviews, CIKM'2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Opinion spam and analysis

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining
          February 2008
          270 pages
          ISBN:9781595939272
          DOI:10.1145/1341531

          Copyright © 2008 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 February 2008

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate498of2,863submissions,17%

          Upcoming Conference

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader