skip to main content
10.1145/2556195.2556261acmconferencesArticle/Chapter ViewAbstractPublication PageswsdmConference Proceedingsconference-collections
research-article

Sentiment analysis on evolving social streams: how self-report imbalances can help

Published:24 February 2014Publication History

ABSTRACT

Real-time sentiment analysis is a challenging machine learning task, due to scarcity of labeled data and sudden changes in sentiment caused by real-world events that need to be instantly interpreted. In this paper we propose solutions to acquire labels and cope with concept drift in this setting, by using findings from social psychology on how humans prefer to disclose some types of emotions. In particular, we use findings that humans are more motivated to report positive feelings rather than negative feelings and also prefer to report extreme feelings rather than average feelings.

We map each of these self-report imbalances on two machine learning sub-tasks. The preference on the disclosure of positive feelings can be explored to generate labeled data on polarizing topics, where a positive event for one group usually induces negative feelings from the opposing group, generating an imbalance on user activity that unveils the current dominant sentiment.

Based on the knowledge that extreme experiences are more reported than average experiences, we propose a feature representation strategy that focus on terms which appear at spikes in the social stream. When comparing to a static text representation (TF-IDF), we found that our feature representation is more capable of detecting new informative features that capture the sudden changes on sentiment stream caused by real-world events.

We show that our social psychology-inspired framework produces accuracies up to 84% while analyzing live reactions in the debate of two popular sports on Twitter - soccer and football - despite requiring no human effort in generating supervisory labels.

References

  1. L. A. Adamic and N. Glance. The political blogosphere and the 2004 U.S. election: divided they blog. In LinkKDD '05: Proceedings of the 3rd international workshop on Link discovery, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. E. W. Anderson. Customer satisfaction and word of mouth. Journal of Service Research, 1(1):5--17, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  3. R. Balasubramanyan, W. W. Cohen, D. Pierce, and D. P. Redlawsk. Modeling polarizing topics: When do different political communities respond differently to the same news? In ICWSM. The AAAI Press, 2012.Google ScholarGoogle Scholar
  4. L. F. Barrett and J. A. Russell. The Structure of Current Affect: Controversies and Emerging Consensus. Current Directions in Psychological Science, 8(1):10--14, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Berger. Contagious: Why Things Catch On. Simon & Schuster, 2013.Google ScholarGoogle Scholar
  6. A. Bifet and R. Kirkby. Data stream mining: a practical approach, August 2009.Google ScholarGoogle Scholar
  7. C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. S. Brody and N. Diakopoulos. Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: using word lengthening to detect sentiment in microblogs. In Proc. of the Conf. on Empirical Methods in Natural Language Processing, EMNLP '11, Stroudsburg, PA, USA, 2011. ACL. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Culotta. Towards detecting influenza epidemics by analyzing twitter messages. In Proceedings of the First Workshop on Social Media Analytics, SOMA '10, pages 115--122, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. C. Dellarocas and R. Narayan. A Statistical Measure of a Population's Propensity to Engage in Post-Purchase Online Word-of-Mouth. Statistical Science, 21(2):277--285, 2006.Google ScholarGoogle Scholar
  11. C. Dellarocas and C. A. Wood. The sound of silence in online feedback: Estimating trading risks in the presence of reporting bias. Manage. Sci., 54(3):460--476, Mar. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. Diener, R. Emmons, R. Larsen, and S. Griffin. The satisfaction with life scale. J Pers Assess, 49(1):71--5, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  13. K. B. Dyer and R. Polikar. Semi-supervised learning in initially labeled non-stationary environments with gradual drift. In IJCNN, pages 1--9. IEEE, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  14. J. a. Gama, R. Sebastião, and P. P. Rodrigues. Issues in evaluation of stream learning algorithms. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '09, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Gamon, S. Basu, D. Belenko, D. Fisher, M. Hurst, and A. C. Konig. Blews: Using blogs to provide context for news articles. In In Proceedings of the 2nd Int'l AAAI Conference on Weblogs and Social Media (ICWSM), 2008.Google ScholarGoogle Scholar
  16. D. Gayo-Avello, P. T. Metaxas, and E. Mustafaraj. Limits of electoral predictions using twitter. In L. A. Adamic, R. A. Baeza-Yates, and S. Counts, editors, ICWSM. The AAAI Press, 2011.Google ScholarGoogle Scholar
  17. A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis, Second Edition. Chapman & Hall/CRC Texts in Statistical Science. Taylor & Francis, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Go, R. Bhayani, and L. Huang. Twitter sentiment classification using distant supervision. Technical report, Stanford, 2009.Google ScholarGoogle Scholar
  19. P. H. C. Guerra, W. Meira, Jr, C. Cardie, and R. Kleinberg. A measure of polarization on social media networks based on community boundaries. In 7th Int'l AAAI Conf. on Weblogs and Social Media (ICWSM 2013), Boston, MA, 2013.Google ScholarGoogle Scholar
  20. P. H. C. Guerra, A. Veloso, W. Meira, Jr, and V. Almeida. From bias to opinion: A transfer-learning approach to real-time sentiment analysis. In Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, San Diego, CA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Q. He, K. Chang, E.-P. Lim, and J. Zhang. Bursty feature representation for clustering text streams. In SDM. SIAM, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  22. N. Hu, J. Zhang, and P. A. Pavlou. Overcoming the j-shaped distribution of product reviews. Commun. ACM, 52(10):144--147, Oct. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. X. Hu, L. Tang, J. Tang, and H. Liu. Exploiting social relations for sentiment analysis in microblogging. In Proceedings of the sixth ACM international conference on Web search and data mining, WSDM '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. B. J. Jansen, M. Zhang, K. Sobel, and A. Chowdury. Twitter power: Tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol., 60:2169--2188, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. H. Jordan, B. Monin, C. S. Dweck, B. J. Lovett, O. P. John, and J. J. Gross. Misery Has More Company Than People Think: Underestimating the Prevalence of Others' Negative Emotions. Personality and Social Psychology Bulletin, 37(1):120--135, Dec. 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. I. Katakis, G. Tsoumakas, and I. Vlahavas. On the utility of incremental feature selection for the classification of textual data streams. In 10th Panhellenic Conference on Informatics (PCI 2005). Springer-Verlag, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. I. Katakis, G. Tsoumakas, and I. Vlahavas. Dynamic feature space and incremental feature selection for the classification of textual data streams. In in ECML/PKDD-2006 International Workshop on Knowledge Discovery from Data Streams. 2006. Springer Verlag, 2006.Google ScholarGoogle Scholar
  28. E. Kiciman. OMG, i have to tweet that! a study of factors that influence tweet rates. In ICWSM. The AAAI Press, 2012.Google ScholarGoogle Scholar
  29. J. Lanagan and A. F. Smeaton. Using twitter to detect and tag important events in live sports. Artificial Intelligence, 2011.Google ScholarGoogle Scholar
  30. R. Larson, M. Csikszentmihalyi, and R. Graef. Time alone in daily experience: Loneliness or renewal? Loneliness: A sourcebook of current theory, research and therapy, 1982.Google ScholarGoogle Scholar
  31. Y.-R. Lin, D. Margolin, B. Keegan, and D. Lazer. Voices of victory: a computational focus group framework for tracking opinion shift in real time. In Proceedings of the 22nd int'l conference on World Wide Web, WWW '13, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. B. Liu. Sentiment Analysis and Opinion Mining. Synthesis digital library of engineering and computer science. Morgan & Claypool, 2012.Google ScholarGoogle Scholar
  33. M. M. Masud, C. Woolam, J. Gao, L. Khan, J. Han, K. W. Hamlen, and N. C. Oza. Facing the reality of data stream classification: coping with scarcity of labeled data. Knowl. Inf. Syst., 33(1):213--244, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. D. Meshi, C. Morawetz, and H. R. Heekeren. Nucleus accumbens response to gains in reputation for the self relative to gains for others predicts social media use. Frontiers in Human Neuroscience, 7(439), 2013.Google ScholarGoogle Scholar
  35. E. Mustafaraj, S. Finn, C.Whitlock, and P. T. Metaxas. Vocal minority versus silent majority: Discovering the opinions of the long tail. In SocialCom/PASSAT. IEEE, 2011.Google ScholarGoogle ScholarCross RefCross Ref
  36. Q. Nguyen, H. Valizadegan, and M. Hauskrecht. Learning classification with auxiliary probabilistic information. In Proc. of the 11th IEEE Int'l Conf. on Data Mining, ICDM'11, Washington, DC, USA, 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2(1-2):1--135, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. G. Ramakrishnan, K. P. Chitrapura, R. Krishnapuram, and P. Bhattacharyya. A model for handling approximate, noisy or incomplete labeling in text classification. In Proceedings of the 22nd international conference on Machine learning, ICML '05, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. M. Rost, L. Barkhuus, H. Cramer, and B. Brown. Representation and communication: challenges in interpreting large social media datasets. In Proceedings of the 2013 conference on Computer supported cooperative work, CSCW'13, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. T. Ryan and S. Xenos. Who uses facebook? an investigation into the relationship between the big five, shyness, narcissism, loneliness, and facebook usage. Computers in Human Behavior, 27(5):1658--1664, 2011. Google ScholarGoogle ScholarCross RefCross Ref
  41. T. Sakaki, M. Okazaki, and Y. Matsuo. Earthquake shakes twitter users: real-time event detection by social sensors. In Proceedings of the 19th international conference on World wide web, WWW '10, New York, NY, USA, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. V. S. Sheng, F. Provost, and P. G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD '08, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. I. S. Silva, J. Gomide, A. Veloso, W. Meira, Jr., and R. Ferreira. Effective sentiment stream analysis with self-augmenting training and demand-driven projection. In Proc. of the 34th Int'l Conf. on Research and development in Information Retrieval (SIGIR), New York, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. P. Soucy and G. W. Mineau. Beyond tfidf weighting for text categorization in the vector space model. In Proceedings of the 19th international joint conference on Artificial intelligence, IJCAI'05, San Francisco, CA, USA, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. P. D. Turney. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In ACL, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. G. Whannel. Reading the sports media audience. MediaSport, pages 221--232, 1998.Google ScholarGoogle Scholar
  47. G. Widmer and M. Kubat. Learning in the presence of concept drift and hidden contexts. Mach. Learn., 23(1):69--101, Apr. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In Proc. of the 14th Int'l Conference on Machine Learning (ICML), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. I. Zliobaite, A. Bifet, G. Holmes, and B. Pfahringer. MOA concept drift active learning strategies for streaming data. Journal of Machine Learning Research, 17:48--55, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Sentiment analysis on evolving social streams: how self-report imbalances can help

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      WSDM '14: Proceedings of the 7th ACM international conference on Web search and data mining
      February 2014
      712 pages
      ISBN:9781450323512
      DOI:10.1145/2556195

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 24 February 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      WSDM '14 Paper Acceptance Rate64of355submissions,18%Overall Acceptance Rate498of2,863submissions,17%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader