research-article

How useful are your comments?: analyzing and predicting youtube comments and comment ratings

Authors:
Stefan Siersdorfer

L3S Research Center, Hannover, Germany

L3S Research Center, Hannover, Germany
View Profile

,
Sergiu Chelaru

L3S Research Center, Hannover, Germany

L3S Research Center, Hannover, Germany
View Profile

,
Wolfgang Nejdl

L3S Research Center, Hannover, Germany

L3S Research Center, Hannover, Germany
View Profile

,
Jose San Pedro

Telefonica Research, Barcelona, Spain

Telefonica Research, Barcelona, Spain
View Profile

WWW '10: Proceedings of the 19th international conference on World wide webApril 2010Pages 891–900https://doi.org/10.1145/1772690.1772781

Published:26 April 2010Publication History

WWW '10: Proceedings of the 19th international conference on World wide web

Pages 891–900

ABSTRACT

An analysis of the social video sharing platform YouTube reveals a high amount of community feedback through comments for published videos as well as through meta ratings for these comments. In this paper, we present an in-depth study of commenting and comment rating behavior on a sample of more than 6 million comments on 67,000 YouTube videos for which we analyzed dependencies between comments, views, comment ratings and topic categories. In addition, we studied the influence of sentiment expressed in comments on the ratings for these comments using the SentiWordNet thesaurus, a lexical WordNet-based resource containing sentiment annotations. Finally, to predict community acceptance for comments not yet rated, we built different classifiers for the estimation of ratings for these comments. The results of our large-scale evaluations are promising and indicate that community feedback on already rated comments can help to filter new unrated comments or suggest particularly useful but still unrated comments.

References

C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender. Learning to rank using gradient descent. In ICML '05: Proceedings of the 22nd international conference on Machine learning, pages 89--96, New York, NY, USA, 2005. ACM. Google ScholarDigital Library
S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan-Kauffman, 2002. Google ScholarDigital Library
X. Cheng, C. Dale, and J. Liu. Understanding the characteristics of internet short video sharing: Youtube as a case study. In Technical Report arXiv:0707.3670v1 cs.NI, New York, NY, USA, 2007. Cornell University, arXiv e-prints.Google Scholar
C. Danescu-Niculescu-Mizil, G. Kossinets, J. Kleinberg, and L. Lee. How opinions are received by online communities: a case study on amazon.com helpfulness votes. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 141--150, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
K. Denecke. Using sentiwordnet for multilingual sentiment analysis. In Data Engineering Workshop, 2008. ICDEW 2008, pages 507--512, 2009.Google Scholar
J. L. Devore. Probability and Statistics for Engineering and the Sciences. Thomson Brooks/Cole, 2004.Google Scholar
S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In CIKM '98: Proceedings of the seventh international conference on Information and knowledge management, pages 148--155, Bethesda, Maryland, United States, 1998. ACM Press. Google ScholarDigital Library
A. Esuli. Automatic Generation of Lexical Resources for Opinion Mining: Models, Algorithms and Applications. PhD in Information Engineering, PhD School "Leonardo da Vinci", University of Pisa, 2008.Google Scholar
A. Esuli and F. Sebastiani. Sentiwordnet: A publicly available lexical resource for opinion mining. In In Proceedings of the 5th Conference on Language Resources and Evaluation (LREC'06), pages 417--422, 2006.Google Scholar
C. Fellbaum, editor. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, MA, 1998.Google ScholarCross Ref
P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: a view from the edge. In IMC '07: Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 15--28, New York, NY, USA, 2007. ACM. Google ScholarDigital Library
F. M. Harper, D. Raban, S. Rafaeli, and J. A. Konstan. Predictors of answer quality in online q&a sites. In CHI '08: Proceeding of the twenty-sixth annual SIGCHI conference on Human factors in computing systems, pages 865--874, New York, NY, USA, 2008. ACM. Google ScholarDigital Library
T. Joachims. Text categorization with Support Vector Machines: Learning with many relevant features. ECML, 1998. Google ScholarDigital Library
T. Joachims. Making large-scale support vector machine learning practical. Advances in kernel methods: support vector learning, pages 169--184, 1999. Google ScholarDigital Library
S.-M. Kim, P. Pantel, T. Chklovski, and M. Pennacchiotti. Automatically assessing review helpfulness. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 423--430, Sydney, Australia, July 2006. Association for Computational Linguistics. Google ScholarDigital Library
J. Liu, Y. Cao, C.-Y. Lin, Y. Huang, and M. Zhou. Low-quality product review detection in opinion summarization. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 334--342, 2007. Poster paper.Google Scholar
Y. Lu, C. Zhai, and N. Sundaresan. Rated aspect summarization of short comments. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 131--140, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
C. Manning and H. Schuetze. Foundations of Statistical Natural Language Processing. MIT Press, 1999. Google ScholarDigital Library
B. Pang and L. Lee. Thumbs up? sentiment classification using machine learning techniques. In Conference on Empirical Methods in Natural Language Processing (EMNLP), Philadelphia, PA, USA, 2002. Google ScholarDigital Library
M. Richardson, A. Prakash, and E. Brill. Beyond pagerank: machine learning for static ranking. In WWW '06: Proceedings of the 15th international conference on World Wide Web, pages 707--715, New York, NY, USA, 2006. ACM. Google ScholarDigital Library
A. Rosenberg and E. Binkowski. Augmenting the kappa statistic to determine interannotator reliability for multiply labeled data points. In HLT-NAACL '04: Proceedings of HLT-NAACL 2004: Short Papers on XX, pages 77--80, Morristown, NJ, USA, 2004. Association for Computational Linguistics. Google ScholarDigital Library
J. San Pedro and S. Siersdorfer. Ranking and classifying attractiveness of photos in folksonomies. In WWW '09: Proceedings of the 18th international conference on World wide web, pages 771--780, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
S. Siersdorfer, J. San Pedro, and M. Sanderson. Automatic video tagging using content redundancy. In SIGIR '09: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval, pages 395--402, New York, NY, USA, 2009. ACM. Google ScholarDigital Library
A. J. Smola and B. Schölkopf. A tutorial on support vector regression. Statistics and Computing, 14(3):199--222, 2004. Google ScholarDigital Library
M. Thomas, B. Pang, and L. Lee. Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. In EMNLP '06: Proceedings of the ACL-02 conference on Empirical methods in natural language processing, pages 327--335, 2006. Google ScholarDigital Library
M. Weimer, I. Gurevych, and M. Muehlhaeuser. Automatically assessing the post quality in online discussions on software. In Companion Volume of the 45rd Annual Meeting of the Association for Computational Linguistics (ACL), 2007. Google ScholarDigital Library
F. Wu and B. A. Huberman. How public opinion forms. In Internet and Network Economics, 4th International Workshop, WINE 2008, Shanghai, China, pages 334--341, 2008. Google ScholarDigital Library
Y. Yang and J. O. Pedersen. A comparative study on feature selection in text categorization. In ICML '97: Proceedings of the Fourteenth International Conference on Machine Learning, pages 412--420, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc. Google ScholarDigital Library

Index Terms

How useful are your comments?: analyzing and predicting youtube comments and comment ratings
1. Information systems
  1. Information systems applications

Recommendations

Analyzing and Mining Comments and Comment Ratings on the Social Web

An analysis of the social video sharing platform YouTube and the news aggregator Yahoo! News reveals the presence of vast amounts of community feedback through comments for published videos and news stories, as well as through metaratings for these ...
Read More
Sifting useful comments from Flickr Commons and YouTube

Cultural institutions are increasingly contributing content to social media platforms to raise awareness and promote use of their collections. Furthermore, they are often the recipients of user comments containing information that may be incorporated in ...
Read More
YouTube Comments on Gene-Edited Babies: What Factors Affect Diverse Opinions in Comments?

This study explored the factors that influence video popularity and diverse opinions in the comments of YouTube videos about gene-edited babies. 107 most viewed videos and corresponding 56,912 direct comments about gene-edited babies were collected from ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '10: Proceedings of the 19th international conference on World wide web
April 2010
1407 pages
ISBN:9781605587998
DOI:10.1145/1772690
General Chairs:
Michael Rappa
North Carolina State University, USA
,
Paul Jones
University of North Carolina at Chapel Hill, USA
,
Program Chairs:
Juliana Freire
University of Utah, USA
,
Soumen Chakrabarti
Indian Institute of Technology, India
Copyright © 2010 International World Wide Web Conference Committee (IW3C2)
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 26 April 2010
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
YouTube
comment ratings
community feedback
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,899of8,196submissions,23%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 184
  Total Citations
  View Citations
- 5,400
  Total Downloads
- Downloads (Last 12 months)402
- Downloads (Last 6 weeks)65
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

ePub

View this article in ePub.

View ePub

How useful are your comments?: analyzing and predicting youtube comments and comment ratings

WWW '10: Proceedings of the 19th international conference on World wide web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Analyzing and Mining Comments and Comment Ratings on the Social Web

Sifting useful comments from Flickr Commons and YouTube

YouTube Comments on Gene-Edited Babies: What Factors Affect Diverse Opinions in Comments?