skip to main content
10.1145/2910896.2910917acmconferencesArticle/Chapter ViewAbstractPublication PagesjcdlConference Proceedingsconference-collections
short-paper

Quality Assessment of Wikipedia Articles without Feature Engineering

Published:19 June 2016Publication History

ABSTRACT

As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no ``golden feature set" was proposed. In this paper, we present a novel approach for classifying Wikipedia articles by analysing their content rather than by considering a feature set. Our approach uses recent techniques in natural language processing and deep learning, and achieved a comparable result with the state-of-the-art.

References

  1. N. S. Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175--185, 1992.Google ScholarGoogle ScholarCross RefCross Ref
  2. Y. Bengio. Learning deep architectures for AI. Found. Trends Mach. Learn., 2(1):1--127, Jan. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J. E. Blumenstock. Size matters: word count as a measure of quality on Wikipedia. In Proc. of WWW, pages 1095--1096, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. 1984.Google ScholarGoogle Scholar
  5. D. H. Dalip, H. Lima, M. A. Gonçalves, M. Cristo, and P. Calado. Quality assessment of collaborative content with minimal information. In Proc. of JCDL, pages 201--210, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. de La Robertie, Y. Pitarch, and O. Teste. Measuring article quality in Wikipedia using the collaboration network. In Proc. of ASONAM, pages 464--471, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Dondio, S. Barrett, S. Weber, and J. M. Seigneur. Extracting trust from domain analysis: A case study on the Wikipedia project. In Proc. of ATC, pages 362--373, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Halfaker and D. Taraborelli. Artificial intelligence service gives Wikipedians 'x-ray specs' to see through bad edits. https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs, 2015. Accessed: 2016-04-01.Google ScholarGoogle Scholar
  9. Z. S. Harris. Distributional structure. Word, 1954.Google ScholarGoogle Scholar
  10. D. Hasan Dalip, M. André Gonçalves, M. Cristo, and P. Calado. Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia. In Proc. of JCDL, pages 295--304, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. Holman Rector. Comparison of Wikipedia and other encyclopedias for accuracy, breadth, and depth in historical articles. Reference services review, 36(1):7--22, 2008.Google ScholarGoogle Scholar
  12. M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in Wikipedia: models and evaluation. In Proc. of CIKM, pages 243--252, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In Proc. of ICML, pages 1188--1196, 2014.Google ScholarGoogle Scholar
  14. S. Lee and J. Y. Choeh. Predicting the helpfulness of online reviews using multilayer perceptron neural networks. Expert Systems with Applications, 41(6):3041--3046, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. N. D. Lewis. Build Your Own Neural Network Today. 2015.Google ScholarGoogle Scholar
  16. E. Lex, M. Voelske, M. Errecalde, E. Ferretti, L. Cagnina, C. Horn, B. Stein, and M. Granitzer. Measuring the quality of web content using factual information. In Proc. of WICOW, pages 7--10, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Y. Suzuki. Quality assessment of Wikipedia articles using h-index. Journal of Information Processing, 23(1):22--30, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  18. Y. Suzuki and M. Yoshikawa. Mutual evaluation of editors and texts for assessing quality of Wikipedia articles. In Proc. of WikiSym, pages 18:1--18:10, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Warncke-Wang, V. R. Ayukaev, B. Hecht, and L. G. Terveen. The success and failure of quality improvement projects in peer production communities. In Proc. of CSCW, pages 743--756, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Warncke-Wang, D. Cosley, and J. Riedl. Tell me more: An actionable quality model for Wikipedia. In Proc. of OpenSym, pages 8:1--8:10, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. G. Wu, M. Harrigan, and P. Cunningham. Classifying Wikipedia articles using network motif counts and ratios. In Proc. of WikiSym, pages 12:1--12:10, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Y. Xu and T. Luo. Measuring article quality in Wikipedia: Lexical clue model. In Proc. of SWS, pages 141--146, 2011.Google ScholarGoogle Scholar

Index Terms

  1. Quality Assessment of Wikipedia Articles without Feature Engineering

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
          June 2016
          316 pages
          ISBN:9781450342292
          DOI:10.1145/2910896

          Copyright © 2016 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 19 June 2016

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • short-paper

          Acceptance Rates

          JCDL '16 Paper Acceptance Rate15of52submissions,29%Overall Acceptance Rate415of1,482submissions,28%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader