ABSTRACT
As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no ``golden feature set" was proposed. In this paper, we present a novel approach for classifying Wikipedia articles by analysing their content rather than by considering a feature set. Our approach uses recent techniques in natural language processing and deep learning, and achieved a comparable result with the state-of-the-art.
- N. S. Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175--185, 1992.Google ScholarCross Ref
- Y. Bengio. Learning deep architectures for AI. Found. Trends Mach. Learn., 2(1):1--127, Jan. 2009. Google ScholarDigital Library
- J. E. Blumenstock. Size matters: word count as a measure of quality on Wikipedia. In Proc. of WWW, pages 1095--1096, 2008. Google ScholarDigital Library
- L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. 1984.Google Scholar
- D. H. Dalip, H. Lima, M. A. Gonçalves, M. Cristo, and P. Calado. Quality assessment of collaborative content with minimal information. In Proc. of JCDL, pages 201--210, 2014. Google ScholarDigital Library
- B. de La Robertie, Y. Pitarch, and O. Teste. Measuring article quality in Wikipedia using the collaboration network. In Proc. of ASONAM, pages 464--471, 2015. Google ScholarDigital Library
- P. Dondio, S. Barrett, S. Weber, and J. M. Seigneur. Extracting trust from domain analysis: A case study on the Wikipedia project. In Proc. of ATC, pages 362--373, 2006. Google ScholarDigital Library
- A. Halfaker and D. Taraborelli. Artificial intelligence service gives Wikipedians 'x-ray specs' to see through bad edits. https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs, 2015. Accessed: 2016-04-01.Google Scholar
- Z. S. Harris. Distributional structure. Word, 1954.Google Scholar
- D. Hasan Dalip, M. André Gonçalves, M. Cristo, and P. Calado. Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia. In Proc. of JCDL, pages 295--304, 2009. Google ScholarDigital Library
- L. Holman Rector. Comparison of Wikipedia and other encyclopedias for accuracy, breadth, and depth in historical articles. Reference services review, 36(1):7--22, 2008.Google Scholar
- M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in Wikipedia: models and evaluation. In Proc. of CIKM, pages 243--252, 2007. Google ScholarDigital Library
- Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In Proc. of ICML, pages 1188--1196, 2014.Google Scholar
- S. Lee and J. Y. Choeh. Predicting the helpfulness of online reviews using multilayer perceptron neural networks. Expert Systems with Applications, 41(6):3041--3046, 2014. Google ScholarDigital Library
- N. D. Lewis. Build Your Own Neural Network Today. 2015.Google Scholar
- E. Lex, M. Voelske, M. Errecalde, E. Ferretti, L. Cagnina, C. Horn, B. Stein, and M. Granitzer. Measuring the quality of web content using factual information. In Proc. of WICOW, pages 7--10, 2012. Google ScholarDigital Library
- Y. Suzuki. Quality assessment of Wikipedia articles using h-index. Journal of Information Processing, 23(1):22--30, 2015.Google ScholarCross Ref
- Y. Suzuki and M. Yoshikawa. Mutual evaluation of editors and texts for assessing quality of Wikipedia articles. In Proc. of WikiSym, pages 18:1--18:10, 2012. Google ScholarDigital Library
- M. Warncke-Wang, V. R. Ayukaev, B. Hecht, and L. G. Terveen. The success and failure of quality improvement projects in peer production communities. In Proc. of CSCW, pages 743--756, 2015. Google ScholarDigital Library
- M. Warncke-Wang, D. Cosley, and J. Riedl. Tell me more: An actionable quality model for Wikipedia. In Proc. of OpenSym, pages 8:1--8:10, 2013. Google ScholarDigital Library
- G. Wu, M. Harrigan, and P. Cunningham. Classifying Wikipedia articles using network motif counts and ratios. In Proc. of WikiSym, pages 12:1--12:10, 2012. Google ScholarDigital Library
- Y. Xu and T. Luo. Measuring article quality in Wikipedia: Lexical clue model. In Proc. of SWS, pages 141--146, 2011.Google Scholar
Index Terms
- Quality Assessment of Wikipedia Articles without Feature Engineering
Recommendations
An end-to-end learning solution for assessing the quality of Wikipedia articles
OpenSym '17: Proceedings of the 13th International Symposium on Open CollaborationWikipedia is considered as the largest knowledge repository in the history of humanity and plays a crucial role in modern daily life. Assigning the correct quality class to Wikipedia articles is an important task in order to provide guidance for both ...
Assessing the Quality of Wikipedia Articles
ICMLSC '21: Proceedings of the 2021 5th International Conference on Machine Learning and Soft ComputingWikipedia is a very important information reference source for the Internet users. Due to the fact that the content of Wikipedia is the collaborative result from a massive number of participants all over the world, the quality of Wikipedia might be ...
Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature Review
Wikipedia is the world’s largest online encyclopedia, but maintaining article quality through collaboration is challenging. Wikipedia designed a quality scale, but with such a manual assessment process, many articles remain unassessed. We review existing ...
Comments