ABSTRACT
An analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and comment rating behavior on a sample of more than 6 million comments on 67,000 YouTube videos for which we analyzed dependencies between comments, views, comment ratings and topic categories. In addition, we studied the influence of sentiment expressed in comments on the ratings for these comments using the SentiWordNet thesaurus, a lexical WordNet-based resource containing sentiment annotations. Finally, to predict community acceptance for comments not yet rated, we built different classifiers for the estimation of ratings for these comments. The results of our large-scale evaluations are promising and indicate that community feedback on already rated comments can help to filter new unrated comments or suggest particularly useful but still unrated comments.
- C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 89--96, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
- S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, 2002. Google ScholarDigital Library
- X. Cheng, C. Dale, and J. Liu. Understanding the characteristics of internet short video sharing: Youtube as a case study. In Technical Report arXiv:0707.3670v1 cs.NI, New York, NY, USA, 2007. Cornell University, arXiv e-prints.Google Scholar
- C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee. How opinions are received by online communities: a case study on amazon.com helpfulness votes. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 141--150, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- K. Denecke. Using sentiwordnet for multilingual sentiment analysis. In Data Engineering Workshop, 2008. ICDEW 2008, pages 507--512, 2009.Google Scholar
- J. L. Devore. Probability and Statistics for Engineering and the Sciences. Thomson Brooks/Cole, 2004.Google Scholar
- S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98: Proceedings of the seventh international conference on Information and knowledge management, pages 148--155, Bethesda, Maryland, United States, 1998. ACM Press. Google ScholarDigital Library
- A. Esuli. Automatic Generation of Lexical Resources for Opinion Mining: Models, Algorithms and Applications. PhD in Information Engineering, PhD School "Leonardo da Vinci", University of Pisa, 2008.Google Scholar
- A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC'06), pages 417--422, 2006.Google Scholar
- C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.Google ScholarCross Ref
- P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: a view from the edge. In IMC '07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 15--28, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
- F. M. Harper, D. Raban, S. Rafaeli, and J. A. Konstan. Predictors of answer quality in online q&a sites. In CHI '08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pages 865--874, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
- T. Joachims. Text categorization with Support Vector Machines: Learning with many relevant features. ECML, 1998. Google ScholarDigital Library
- T. Joachims. Making large-scale support vector machine learning practical. Advances in kernel methods: support vector learning, pages 169--184, 1999. Google ScholarDigital Library
- S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti. Automatically assessing review helpfulness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 423--430, Sydney, Australia, July 2006. Association for Computational Linguistics. Google ScholarDigital Library
- J. Liu, Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou. Low-quality product review detection in opinion summarization. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 334--342, 2007. Poster paper.Google Scholar
- Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 131--140, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- C. Manning and H. Schuetze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarDigital Library
- B. Pang and L. Lee. Thumbs up? sentiment classification using machine learning techniques. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, USA, 2002. Google ScholarDigital Library
- M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: machine learning for static ranking. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 707--715, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
- A. Rosenberg and E. Binkowski. Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In HLT-NAACL '04: Proceedings of HLT-NAACL 2004: Short Papers on XX, pages 77--80, Morristown, NJ, USA, 2004. Association for Computational Linguistics. Google ScholarDigital Library
- J. San Pedro and S. Siersdorfer. Ranking and classifying attractiveness of photos in folksonomies. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 771--780, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- S. Siersdorfer, J. San Pedro, and M. Sanderson. Automatic video tagging using content redundancy. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 395--402, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
- A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and Computing, 14(3):199--222, 2004. Google ScholarDigital Library
- M. Thomas, B. Pang, and L. Lee. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In EMNLP '06: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 327--335, 2006. Google ScholarDigital Library
- M. Weimer, I. Gurevych, and M. Muehlhaeuser. Automatically assessing the post quality in online discussions on software. In Companion Volume of the 45rd Annual Meeting of the Association for Computational Linguistics (ACL), 2007. Google ScholarDigital Library
- F. Wu and B. A. Huberman. How public opinion forms. In Internet and Network Economics, 4th International Workshop, WINE 2008, Shanghai, China, pages 334--341, 2008. Google ScholarDigital Library
- Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97: Proceedings of the Fourteenth International Conference on Machine Learning, pages 412--420, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library
Index Terms
- How useful are your comments?: analyzing and predicting youtube comments and comment ratings
Recommendations
Analyzing and Mining Comments and Comment Ratings on the Social Web
An analysis of the social video sharing platform YouTube and the news aggregator Yahoo! News reveals the presence of vast amounts of community feedback through comments for published videos and news stories, as well as through metaratings for these ...
Sifting useful comments from Flickr Commons and YouTube
Cultural institutions are increasingly contributing content to social media platforms to raise awareness and promote use of their collections. Furthermore, they are often the recipients of user comments containing information that may be incorporated in ...
YouTube Comments on Gene-Edited Babies: What Factors Affect Diverse Opinions in Comments?
This study explored the factors that influence video popularity and diverse opinions in the comments of YouTube videos about gene-edited babies. 107 most viewed videos and corresponding 56,912 direct comments about gene-edited babies were collected from ...
Comments