research-article

Sentiment analysis on evolving social streams: how self-report imbalances can help

Authors:
Pedro Calais Guerra

UFMG, Brazil, Belo Horizonte, MG, Brazil

UFMG, Brazil, Belo Horizonte, MG, Brazil
View Profile

,
Wagner Meira

UFMG, Brazil, Belo Horizonte, MG, Brazil

UFMG, Brazil, Belo Horizonte, MG, Brazil
View Profile

,
Claire Cardie

Cornell University, Ithaca, NY, NY, USA

Cornell University, Ithaca, NY, NY, USA
View Profile

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data miningFebruary 2014Pages 443–452https://doi.org/10.1145/2556195.2556261

Published:24 February 2014Publication History

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

Pages 443–452

ABSTRACT

Real-time sentiment analysis is a challenging machine learning task, due to scarcity of labeled data and sudden changes in sentiment caused by real-world events that need to be instantly interpreted. In this paper we propose solutions to acquire labels and cope with concept drift in this setting, by using findings from social psychology on how humans prefer to disclose some types of emotions. In particular, we use findings that humans are more motivated to report positive feelings rather than negative feelings and also prefer to report extreme feelings rather than average feelings.

We map each of these self-report imbalances on two machine learning sub-tasks. The preference on the disclosure of positive feelings can be explored to generate labeled data on polarizing topics, where a positive event for one group usually induces negative feelings from the opposing group, generating an imbalance on user activity that unveils the current dominant sentiment.

Based on the knowledge that extreme experiences are more reported than average experiences, we propose a feature representation strategy that focus on terms which appear at spikes in the social stream. When comparing to a static text representation (TF-IDF), we found that our feature representation is more capable of detecting new informative features that capture the sudden changes on sentiment stream caused by real-world events.

We show that our social psychology-inspired framework produces accuracies up to 84% while analyzing live reactions in the debate of two popular sports on Twitter - soccer and football - despite requiring no human effort in generating supervisory labels.

References

L. A. Adamic and N. Glance. The political blogosphere and the 2004 U.S. election: divided they blog. In LinkKDD '05: Proceedings of the 3rd international workshop on Link discovery, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
E. W. Anderson. Customer satisfaction and word of mouth. Journal of Service Research, 1(1):5--17, 1998.Google ScholarCross Ref
R. Balasubramanyan, W. W. Cohen, D. Pierce, and D. P. Redlawsk. Modeling polarizing topics: When do different political communities respond differently to the same news? In ICWSM. The AAAI Press, 2012.Google Scholar
L. F. Barrett and J. A. Russell. The Structure of Current Affect: Controversies and Emerging Consensus. Current Directions in Psychological Science, 8(1):10--14, 1999.Google ScholarCross Ref
J. Berger. Contagious: Why Things Catch On. Simon & Schuster, 2013.Google Scholar
A. Bifet and R. Kirkby. Data stream mining: a practical approach, August 2009.Google Scholar
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. Google ScholarDigital Library
S. Brody and N. Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. In Proc. of the Conf. on Empirical Methods in Natural Language Processing, EMNLP '11, Stroudsburg, PA, USA, 2011. ACL. Google ScholarDigital Library
A. Culotta. Towards detecting influenza epidemics by analyzing twitter messages. In Proceedings of the First Workshop on Social Media Analytics, SOMA '10, pages 115--122, New York, NY, USA, 2010. ACM. Google ScholarDigital Library
C. Dellarocas and R. Narayan. A Statistical Measure of a Population's Propensity to Engage in Post-Purchase Online Word-of-Mouth. Statistical Science, 21(2):277--285, 2006.Google Scholar
C. Dellarocas and C. A. Wood. The sound of silence in online feedback: Estimating trading risks in the presence of reporting bias. Manage. Sci., 54(3):460--476, Mar. 2008. Google ScholarDigital Library
E. Diener, R. Emmons, R. Larsen, and S. Griffin. The satisfaction with life scale. J Pers Assess, 49(1):71--5, 1985.Google ScholarCross Ref
K. B. Dyer and R. Polikar. Semi-supervised learning in initially labeled non-stationary environments with gradual drift. In IJCNN, pages 1--9. IEEE, 2012.Google ScholarCross Ref
J. a. Gama, R. Sebastião, and P. P. Rodrigues. Issues in evaluation of stream learning algorithms. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst, and A. C. Konig. Blews: Using blogs to provide context for news articles. In In Proceedings of the 2nd Int'l AAAI Conference on Weblogs and Social Media (ICWSM), 2008.Google Scholar
D. Gayo-Avello, P. T. Metaxas, and E. Mustafaraj. Limits of electoral predictions using twitter. In L. A. Adamic, R. A. Baeza-Yates, and S. Counts, editors, ICWSM. The AAAI Press, 2011.Google Scholar
A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis, Second Edition. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2003.Google ScholarCross Ref
A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. Technical report, Stanford, 2009.Google Scholar
P. H. C. Guerra, W. Meira, Jr, C. Cardie, and R. Kleinberg. A measure of polarization on social media networks based on community boundaries. In 7th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM 2013), Boston, MA, 2013.Google Scholar
P. H. C. Guerra, A. Veloso, W. Meira, Jr, and V. Almeida. From bias to opinion: A transfer-learning approach to real-time sentiment analysis. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Diego, CA, 2011. Google ScholarDigital Library
Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature representation for clustering text streams. In SDM. SIAM, 2007.Google ScholarCross Ref
N. Hu, J. Zhang, and P. A. Pavlou. Overcoming the j-shaped distribution of product reviews. Commun. ACM, 52(10):144--147, Oct. 2009. Google ScholarDigital Library
X. Hu, L. Tang, J. Tang, and H. Liu. Exploiting social relations for sentiment analysis in microblogging. In Proceedings of the sixth ACM international conference on Web search and data mining, WSDM '13, 2013. Google ScholarDigital Library
B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power: Tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol., 60:2169--2188, 2009. Google ScholarDigital Library
A. H. Jordan, B. Monin, C. S. Dweck, B. J. Lovett, O. P. John, and J. J. Gross. Misery Has More Company Than People Think: Underestimating the Prevalence of Others' Negative Emotions. Personality and Social Psychology Bulletin, 37(1):120--135, Dec. 2010.Google ScholarCross Ref
I. Katakis, G. Tsoumakas, and I. Vlahavas. On the utility of incremental feature selection for the classification of textual data streams. In 10th Panhellenic Conference on Informatics (PCI 2005). Springer-Verlag, 2005. Google ScholarDigital Library
I. Katakis, G. Tsoumakas, and I. Vlahavas. Dynamic feature space and incremental feature selection for the classification of textual data streams. In in ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams. 2006. Springer Verlag, 2006.Google Scholar
E. Kiciman. OMG, i have to tweet that! a study of factors that influence tweet rates. In ICWSM. The AAAI Press, 2012.Google Scholar
J. Lanagan and A. F. Smeaton. Using twitter to detect and tag important events in live sports. Artificial Intelligence, 2011.Google Scholar
R. Larson, M. Csikszentmihalyi, and R. Graef. Time alone in daily experience: Loneliness or renewal? Loneliness: A sourcebook of current theory, research and therapy, 1982.Google Scholar
Y.-R. Lin, D. Margolin, B. Keegan, and D. Lazer. Voices of victory: a computational focus group framework for tracking opinion shift in real time. In Proceedings of the 22nd int'l conference on World Wide Web, WWW '13, 2013. Google ScholarDigital Library
B. Liu. Sentiment Analysis and Opinion Mining. Synthesis digital library of engineering and computer science. Morgan & Claypool, 2012.Google Scholar
M. M. Masud, C. Woolam, J. Gao, L. Khan, J. Han, K. W. Hamlen, and N. C. Oza. Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst., 33(1):213--244, 2011.Google ScholarDigital Library
D. Meshi, C. Morawetz, and H. R. Heekeren. Nucleus accumbens response to gains in reputation for the self relative to gains for others predicts social media use. Frontiers in Human Neuroscience, 7(439), 2013.Google Scholar
E. Mustafaraj, S. Finn, C.Whitlock, and P. T. Metaxas. Vocal minority versus silent majority: Discovering the opinions of the long tail. In SocialCom/PASSAT. IEEE, 2011.Google ScholarCross Ref
Q. Nguyen, H. Valizadegan, and M. Hauskrecht. Learning classification with auxiliary probabilistic information. In Proc. of the 11th IEEE Int'l Conf. on Data Mining, ICDM'11, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarDigital Library
B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1--135, 2008. Google ScholarDigital Library
G. Ramakrishnan, K. P. Chitrapura, R. Krishnapuram, and P. Bhattacharyya. A model for handling approximate, noisy or incomplete labeling in text classification. In Proceedings of the 22nd international conference on Machine learning, ICML '05, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
M. Rost, L. Barkhuus, H. Cramer, and B. Brown. Representation and communication: challenges in interpreting large social media datasets. In Proceedings of the 2013 conference on Computer supported cooperative work, CSCW'13, New York, NY, USA, 2013. ACM. Google ScholarDigital Library
T. Ryan and S. Xenos. Who uses facebook? an investigation into the relationship between the big five, shyness, narcissism, loneliness, and facebook usage. Computers in Human Behavior, 27(5):1658--1664, 2011. Google ScholarCross Ref
T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, WWW '10, New York, NY, USA, 2010. Google ScholarDigital Library
V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
I. S. Silva, J. Gomide, A. Veloso, W. Meira, Jr., and R. Ferreira. Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In Proc. of the 34th Int'l Conf. on Research and development in Information Retrieval (SIGIR), New York, USA, 2011. ACM. Google ScholarDigital Library
P. Soucy and G. W. Mineau. Beyond tfidf weighting for text categorization in the vector space model. In Proceedings of the 19th international joint conference on Artificial intelligence, IJCAI'05, San Francisco, CA, USA, 2005. Google ScholarDigital Library
P. D. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In ACL, 2002. Google ScholarDigital Library
G. Whannel. Reading the sports media audience. MediaSport, pages 221--232, 1998.Google Scholar
G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Mach. Learn., 23(1):69--101, Apr. 1996. Google ScholarDigital Library
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. of the 14th Int'l Conference on Machine Learning (ICML), 1997. Google ScholarDigital Library
I. Zliobaite, A. Bifet, G. Holmes, and B. Pfahringer. MOA concept drift active learning strategies for streaming data. Journal of Machine Learning Research, 17:48--55, 2011.Google Scholar

Index Terms

Sentiment analysis on evolving social streams: how self-report imbalances can help
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

Sentiment Analysis on Twitter: A text Mining Approach to the Afghanistan Status Reviews
AIVR 2018: Proceedings of the 2018 International Conference on Artificial Intelligence and Virtual Reality

Twitter has become a popular social media network where people express their opinions and views on political and other topics. Social media analysis of Twitter can be used to understand which sentiment and opinions are implicit in these social media ...
Read More
Joint sentiment/topic model for sentiment analysis
CIKM '09: Proceedings of the 18th ACM conference on Information and knowledge management

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This paper proposes a novel probabilistic modeling framework based on Latent Dirichlet ...
Read More
Social sentiment sensor: a visualization system for topic detection and topic sentiment analysis on microblog

As a new form of social media, microblogging provides platform sharing, wherein users can share their feelings and ideas on certain topics. Bursty topics from microblogs are the results of the emerging issues that instantly attract more followers and ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
February 2014
712 pages
ISBN:9781450323512
DOI:10.1145/2556195
General Chairs:
Ben Carterette
University of Delaware, USA
,
Fernando Diaz
Microsoft Research, USA
,
Program Chairs:
Carlos Castillo
Qatar Computing Research Institute, Qatar
,
Donald Metzler
Google, USA
Copyright © 2014 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 24 February 2014
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
sentiment analysis
social media analytics
stream data mining
Qualifiers
- research-article
Conference

Acceptance Rates
WSDM '14 Paper Acceptance Rate64of355submissions,18%Overall Acceptance Rate498of2,863submissions,17%
More
Upcoming Conference
WSDM '25

Sponsor:

sigir

sigir

sigir

sigir

The Eighteenth ACM International Conference on Web Search and Data Mining

April 7 - 11, 2025

Hannover , Germany
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 22
  Total Citations
  View Citations
- 569
  Total Downloads
- Downloads (Last 12 months)13
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Sentiment analysis on evolving social streams: how self-report imbalances can help

WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining

ABSTRACT

References

Cited By

Index Terms

Recommendations

Sentiment Analysis on Twitter: A text Mining Approach to the Afghanistan Status Reviews

Joint sentiment/topic model for sentiment analysis

Social sentiment sensor: a visualization system for topic detection and topic sentiment analysis on microblog