short-paper

Quality Assessment of Wikipedia Articles without Feature Engineering

Authors:
Quang Vinh Dang

Universite de Lorraine & Inria, Vandoeuvre-lès-Nancy, France

Universite de Lorraine & Inria, Vandoeuvre-lès-Nancy, France
View Profile

,
Claudia-Lavinia Ignat

Universite de Lorraine & Inria, Vandoeuvre-lès-Nancy, France

Universite de Lorraine & Inria, Vandoeuvre-lès-Nancy, France
View Profile

JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital LibrariesJune 2016Pages 27–30https://doi.org/10.1145/2910896.2910917

Published:19 June 2016Publication History

JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries

Pages 27–30

ABSTRACT

As Wikipedia became the largest human knowledge repository, quality measurement of its articles received a lot of attention during the last decade. Most research efforts focused on classification of Wikipedia articles quality by using a different feature set. However, so far, no ``golden feature set" was proposed. In this paper, we present a novel approach for classifying Wikipedia articles by analysing their content rather than by considering a feature set. Our approach uses recent techniques in natural language processing and deep learning, and achieved a comparable result with the state-of-the-art.

References

N. S. Altman. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician, 46(3):175--185, 1992.Google ScholarCross Ref
Y. Bengio. Learning deep architectures for AI. Found. Trends Mach. Learn., 2(1):1--127, Jan. 2009. Google ScholarDigital Library
J. E. Blumenstock. Size matters: word count as a measure of quality on Wikipedia. In Proc. of WWW, pages 1095--1096, 2008. Google ScholarDigital Library
L. Breiman, J. Friedman, C. J. Stone, and R. A. Olshen. Classification and regression trees. 1984.Google Scholar
D. H. Dalip, H. Lima, M. A. Gonçalves, M. Cristo, and P. Calado. Quality assessment of collaborative content with minimal information. In Proc. of JCDL, pages 201--210, 2014. Google ScholarDigital Library
B. de La Robertie, Y. Pitarch, and O. Teste. Measuring article quality in Wikipedia using the collaboration network. In Proc. of ASONAM, pages 464--471, 2015. Google ScholarDigital Library
P. Dondio, S. Barrett, S. Weber, and J. M. Seigneur. Extracting trust from domain analysis: A case study on the Wikipedia project. In Proc. of ATC, pages 362--373, 2006. Google ScholarDigital Library
A. Halfaker and D. Taraborelli. Artificial intelligence service gives Wikipedians 'x-ray specs' to see through bad edits. https://blog.wikimedia.org/2015/11/30/artificial-intelligence-x-ray-specs, 2015. Accessed: 2016-04-01.Google Scholar
Z. S. Harris. Distributional structure. Word, 1954.Google Scholar
D. Hasan Dalip, M. André Gonçalves, M. Cristo, and P. Calado. Automatic quality assessment of content created collaboratively by web communities: a case study of Wikipedia. In Proc. of JCDL, pages 295--304, 2009. Google ScholarDigital Library
L. Holman Rector. Comparison of Wikipedia and other encyclopedias for accuracy, breadth, and depth in historical articles. Reference services review, 36(1):7--22, 2008.Google Scholar
M. Hu, E.-P. Lim, A. Sun, H. W. Lauw, and B.-Q. Vuong. Measuring article quality in Wikipedia: models and evaluation. In Proc. of CIKM, pages 243--252, 2007. Google ScholarDigital Library
Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. In Proc. of ICML, pages 1188--1196, 2014.Google Scholar
S. Lee and J. Y. Choeh. Predicting the helpfulness of online reviews using multilayer perceptron neural networks. Expert Systems with Applications, 41(6):3041--3046, 2014. Google ScholarDigital Library
N. D. Lewis. Build Your Own Neural Network Today. 2015.Google Scholar
E. Lex, M. Voelske, M. Errecalde, E. Ferretti, L. Cagnina, C. Horn, B. Stein, and M. Granitzer. Measuring the quality of web content using factual information. In Proc. of WICOW, pages 7--10, 2012. Google ScholarDigital Library
Y. Suzuki. Quality assessment of Wikipedia articles using h-index. Journal of Information Processing, 23(1):22--30, 2015.Google ScholarCross Ref
Y. Suzuki and M. Yoshikawa. Mutual evaluation of editors and texts for assessing quality of Wikipedia articles. In Proc. of WikiSym, pages 18:1--18:10, 2012. Google ScholarDigital Library
M. Warncke-Wang, V. R. Ayukaev, B. Hecht, and L. G. Terveen. The success and failure of quality improvement projects in peer production communities. In Proc. of CSCW, pages 743--756, 2015. Google ScholarDigital Library
M. Warncke-Wang, D. Cosley, and J. Riedl. Tell me more: An actionable quality model for Wikipedia. In Proc. of OpenSym, pages 8:1--8:10, 2013. Google ScholarDigital Library
G. Wu, M. Harrigan, and P. Cunningham. Classifying Wikipedia articles using network motif counts and ratios. In Proc. of WikiSym, pages 12:1--12:10, 2012. Google ScholarDigital Library
Y. Xu and T. Luo. Measuring article quality in Wikipedia: Lexical clue model. In Proc. of SWS, pages 141--146, 2011.Google Scholar

Index Terms

Quality Assessment of Wikipedia Articles without Feature Engineering
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Document analysis
2. Information systems
  1. Information retrieval
    1. Document representation
      1. Content analysis and feature selection
    2. Retrieval models and ranking
      1. Learning to rank

Recommendations

An end-to-end learning solution for assessing the quality of Wikipedia articles
OpenSym '17: Proceedings of the 13th International Symposium on Open Collaboration

Wikipedia is considered as the largest knowledge repository in the history of humanity and plays a crucial role in modern daily life. Assigning the correct quality class to Wikipedia articles is an important task in order to provide guidance for both ...
Read More
Assessing the Quality of Wikipedia Articles
ICMLSC '21: Proceedings of the 2021 5th International Conference on Machine Learning and Soft Computing

Wikipedia is a very important information reference source for the Internet users. Due to the fact that the content of Wikipedia is the collaborative result from a massive number of participants all over the world, the quality of Wikipedia might be ...
Read More
Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature Review
Wikipedia is the world’s largest online encyclopedia, but maintaining article quality through collaboration is challenging. Wikipedia designed a quality scale, but with such a manual assessment process, many articles remain unassessed. We review existing ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries
June 2016
316 pages
ISBN:9781450342292
DOI:10.1145/2910896
General Chairs:
Nabil R. Adam
Rutgers University
,
Boots Cassel
Villanova University
,
Yelena Yesha
University of Maryland, Baltimore County
,
Program Chairs:
Richard Furuta
Texas A&M University
,
Michele C. Weigle
Old Dominion University
Copyright © 2016 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 19 June 2016
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
deep learning
document representation
feature engineering
quality assessment
wikipedia
Qualifiers
- short-paper
Conference

Acceptance Rates
JCDL '16 Paper Acceptance Rate15of52submissions,29%Overall Acceptance Rate415of1,482submissions,28%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 34
  Total Citations
  View Citations
- 506
  Total Downloads
- Downloads (Last 12 months)8
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Quality Assessment of Wikipedia Articles without Feature Engineering

JCDL '16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries

ABSTRACT

References

Cited By

Index Terms

Recommendations

An end-to-end learning solution for assessing the quality of Wikipedia articles

Assessing the Quality of Wikipedia Articles

Automatic Quality Assessment of Wikipedia Articles—A Systematic Literature Review