research-article

Opinion spam and analysis

Authors:
Nitin Jindal

University of Illinois at Chicago, Chicago, IL

University of Illinois at Chicago, Chicago, IL
View Profile

,
Bing Liu

University of Illinois at Chicago, Chicago, IL

University of Illinois at Chicago, Chicago, IL
View Profile

WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data MiningFebruary 2008Pages 219–230https://doi.org/10.1145/1341531.1341560

Published:11 February 2008Publication History

WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

Pages 219–230

ABSTRACT

Evaluative texts on the Web have become a valuable source of opinions on products, services, events, individuals, etc. Recently, many researchers have studied such opinion sources as product reviews, forum posts, and blogs. However, existing research has been focused on classification and summarization of opinions using natural language processing and data mining techniques. An important issue that has been neglected so far is opinion spam or trustworthiness of online opinions. In this paper, we study this issue in the context of product reviews, which are opinion rich and are widely used by consumers and product manufacturers. In the past two years, several startup companies also appeared which aggregate opinions from product reviews. It is thus high time to study spam in reviews. To the best of our knowledge, there is still no published study on this topic, although Web spam and email spam have been investigated extensively. We will see that opinion spam is quite different from Web spam and email spam, and thus requires different detection techniques. Based on the analysis of 5.8 million reviews and 2.14 million reviewers from amazon.com, we show that opinion spam in reviews is widespread. This paper analyzes such spam activities and presents some novel techniques to detect them

References

E. Amitay, D. Carmel, A. Darlow, R. Lempel & A. Soffer. The connectivity sonar: detecting site functionality by structural patterns. Hypertext'03, 2003. Google ScholarDigital Library
M. Andreolini, A. Bulgarelli, M. Colajanni & F. Mazzoni. Honeyspam: Honeypots fighting spam at the source. In Proc. USENIX SRUTI 2005, Cambridge, MA, July 2005. Google ScholarDigital Library
R. Baeza-Yates, C. Castillo & V. Lopez. PageRank increase under different collusion topologies. AIRWeb'05, 2005.Google Scholar
A. Z. Broder. On the resemblance and containment of documents. In Proceedings of Compression and Complexity of Sequences 1997, IEEE Computer Society, 1997. Google ScholarDigital Library
C. Castillo, D. Donato, L. Becchetti, P. Boldi, S. Leonardi, M. Santini, S. Vigna. A reference collection for web spam, SIGIR Forum'06, 2006. Google ScholarDigital Library
S. Chakrabarti. Mining the Web: discovering knowledge from hypertext data. Morgan Kaufmann, 2003. Google ScholarDigital Library
K. Dave, S. Lawrence & D. Pennock. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. WWW'2003. Google ScholarDigital Library
I. Fette, N. Sadeh-Koniecpol, A. Tomasic. Learning to Detect Phishing Emails. WWW2007. Google ScholarDigital Library
D. Fetterly, M. Manasse & M. Najork. Detecting phrase-level duplication on the World Wide Web. SIGIR'2005. Google ScholarDigital Library
Z. Gyongyi & H. Garcia-Molina. Web Spam Taxonomy. Technical Report, Stanford University, 2004.Google Scholar
M. R. Henzinger: Finding near-duplicate web pages: a large-scale evaluation of algorithms. SIGIR'06, 2006. Google ScholarDigital Library
M. Hu & B. Liu. Mining and summarizing customer reviews. KDD'2004. Google ScholarDigital Library
N. Jindal and B. Liu. Product Review Analysis. Technical Report, UIC, 2007.Google Scholar
N. Jindal and B. Liu. Analyzing and Detecting Review Spam. ICDM2007. Google ScholarDigital Library
W. Li, N. Zhong, C. Liu. Combining Multiple Email Filters Based on Multivariate Statistical Analysis. ISMIS 2006. Google ScholarDigital Library
B. Liu. Web Data Mining: Exploring hyperlinks, contents and usage data. Springer, 2007. Google ScholarDigital Library
A. Metwally, D. Agrawal, A. Abbadi. DETECTIVES: DETEcting Coalition hiT Inflation attacks in adVertising nEtworks Streams. WWW2007. Google ScholarDigital Library
B. Mobasher, R. Burke & J. J Sandvig. Model-based collaborative filtering as a defense against profile injection attacks. AAAI'2006. Google ScholarDigital Library
A. Ntoulas, M. Najork, M. Manasse & D. Fetterly. Detecting Spam Web Pages through Content Analysis. WWW'2006. Google ScholarDigital Library
B. Pang, L. Lee & S. Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques. EMNLP'2002. Google ScholarDigital Library
A-M. Popescu and O. Etzioni. Extracting Product Features and Opinions from Reviews. EMNLP'2005. Google ScholarDigital Library
M. Sahami and S. Dumais and D. Heckerman and E. Horvitz. A Bayesian Approach to Filtering Junk {E}-Mail. AAAI Technical Report WS-98-05, 1998.Google Scholar
P. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. ACL'2002. Google ScholarDigital Library
Y. Wang, M. Ma, Y. Niu, H. Chen. Spam Double-Funnel: Connecting Web Spammers with Advertisers. WWW2007. Google ScholarDigital Library
B. Wu and B. D. Davison. Identifying link farm spam pages. WWW'06, 2006. Google ScholarDigital Library
B. Wu, V. Goel & B. D. Davison. Topical TrustRank: using topicality to combat Web spam. WWW'2006. Google ScholarDigital Library
S. Ye, R. Song, J.-R. Wen, W.-Y. Ma. A Query-dependent duplicate detection approach for large scale search engines. APWeb'04, 2004.Google ScholarCross Ref
Z. Zhang & B. Varadarajan, Utility scoring of product reviews, CIKM'2006. Google ScholarDigital Library

Index Terms

Opinion spam and analysis
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Document filtering
      2. Information extraction
  2. Information systems applications
    1. Data mining

Recommendations

Detection of review spam

We have extracted all types of data that can be used in spam detection techniques.We have reviewed state of the art literature in the area of detection of spam reviews.In this research, we have categorized and classified spam detection methods and ...
Read More
Opinion spam detection framework using hybrid classification scheme
Abstract
With the advent of social networking sites, opinion-mining applications have attracted the interest of the online community on review sites to know about products for their purchase decisions. However, due to increasing trend of posting spam (fake)...
Read More
Exploring groups of opinion spam using sentiment analysis guided by nominated topics
Graphical abstract

Display Omitted
Highlights
- This is the first study using platform-offered aspects for spam detection.
- This ...
Abstract
Currently, it is common to see untruthful opinions (also known as review spam, fraud or shilling attack) that resemble each other explicitly or implicitly across multiple business-to-customer websites or opinion sharing communities. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining
February 2008
270 pages
ISBN:9781595939272
DOI:10.1145/1341531
General Chair:
Marc Najork
Microsoft, USA
,
Program Chairs:
Andrei Broder
Yahoo!, USA
,
Soumen Chakrabarti
IIT Bombay, India
Copyright © 2008 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 11 February 2008
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
fake reviews
opinion spam
review analysis
review spam
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate498of2,863submissions,17%
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 833
  Total Citations
  View Citations
- 5,438
  Total Downloads
- Downloads (Last 12 months)275
- Downloads (Last 6 weeks)35
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Opinion spam and analysis

WSDM '08: Proceedings of the 2008 International Conference on Web Search and Data Mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Detection of review spam

Opinion spam detection framework using hybrid classification scheme

Exploring groups of opinion spam using sentiment analysis guided by nominated topics