skip to main content
10.1145/1772690.1772781acmotherconferencesArticle/Chapter ViewAbstractPublication PageswwwConference Proceedingsconference-collections
research-article

How useful are your comments?: analyzing and predicting youtube comments and comment ratings

Published:26 April 2010Publication History

ABSTRACT

An analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and comment rating behavior on a sample of more than 6 million comments on 67,000 YouTube videos for which we analyzed dependencies between comments, views, comment ratings and topic categories. In addition, we studied the influence of sentiment expressed in comments on the ratings for these comments using the SentiWordNet thesaurus, a lexical WordNet-based resource containing sentiment annotations. Finally, to predict community acceptance for comments not yet rated, we built different classifiers for the estimation of ratings for these comments. The results of our large-scale evaluations are promising and indicate that community feedback on already rated comments can help to filter new unrated comments or suggest particularly useful but still unrated comments.

References

  1. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 89--96, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. X. Cheng, C. Dale, and J. Liu. Understanding the characteristics of internet short video sharing: Youtube as a case study. In Technical Report arXiv:0707.3670v1 cs.NI, New York, NY, USA, 2007. Cornell University, arXiv e-prints.Google ScholarGoogle Scholar
  4. C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee. How opinions are received by online communities: a case study on amazon.com helpfulness votes. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 141--150, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Denecke. Using sentiwordnet for multilingual sentiment analysis. In Data Engineering Workshop, 2008. ICDEW 2008, pages 507--512, 2009.Google ScholarGoogle Scholar
  6. J. L. Devore. Probability and Statistics for Engineering and the Sciences. Thomson Brooks/Cole, 2004.Google ScholarGoogle Scholar
  7. S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98: Proceedings of the seventh international conference on Information and knowledge management, pages 148--155, Bethesda, Maryland, United States, 1998. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Esuli. Automatic Generation of Lexical Resources for Opinion Mining: Models, Algorithms and Applications. PhD in Information Engineering, PhD School "Leonardo da Vinci", University of Pisa, 2008.Google ScholarGoogle Scholar
  9. A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC'06), pages 417--422, 2006.Google ScholarGoogle Scholar
  10. C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.Google ScholarGoogle ScholarCross RefCross Ref
  11. P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: a view from the edge. In IMC '07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 15--28, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. M. Harper, D. Raban, S. Rafaeli, and J. A. Konstan. Predictors of answer quality in online q&a sites. In CHI '08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pages 865--874, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. T. Joachims. Text categorization with Support Vector Machines: Learning with many relevant features. ECML, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. T. Joachims. Making large-scale support vector machine learning practical. Advances in kernel methods: support vector learning, pages 169--184, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti. Automatically assessing review helpfulness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 423--430, Sydney, Australia, July 2006. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Liu, Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou. Low-quality product review detection in opinion summarization. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 334--342, 2007. Poster paper.Google ScholarGoogle Scholar
  17. Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 131--140, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. C. Manning and H. Schuetze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Pang and L. Lee. Thumbs up? sentiment classification using machine learning techniques. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, USA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: machine learning for static ranking. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 707--715, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Rosenberg and E. Binkowski. Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In HLT-NAACL '04: Proceedings of HLT-NAACL 2004: Short Papers on XX, pages 77--80, Morristown, NJ, USA, 2004. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. San Pedro and S. Siersdorfer. Ranking and classifying attractiveness of photos in folksonomies. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 771--780, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Siersdorfer, J. San Pedro, and M. Sanderson. Automatic video tagging using content redundancy. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 395--402, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and Computing, 14(3):199--222, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Thomas, B. Pang, and L. Lee. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In EMNLP '06: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 327--335, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Weimer, I. Gurevych, and M. Muehlhaeuser. Automatically assessing the post quality in online discussions on software. In Companion Volume of the 45rd Annual Meeting of the Association for Computational Linguistics (ACL), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. F. Wu and B. A. Huberman. How public opinion forms. In Internet and Network Economics, 4th International Workshop, WINE 2008, Shanghai, China, pages 334--341, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97: Proceedings of the Fourteenth International Conference on Machine Learning, pages 412--420, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How useful are your comments?: analyzing and predicting youtube comments and comment ratings

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      WWW '10: Proceedings of the 19th international conference on World wide web
      April 2010
      1407 pages
      ISBN:9781605587998
      DOI:10.1145/1772690

      Copyright © 2010 International World Wide Web Conference Committee (IW3C2)

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 26 April 2010

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate1,899of8,196submissions,23%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    ePub

    View this article in ePub.

    View ePub