ABSTRACT
Wikipedia is an online encyclopedia which has undergone tremendous growth. However, this same growth has made it difficult to characterize its content and coverage. In this paper we develop measures to map Wikipedia using its socially annotated, hierarchical category structure. We introduce a mapping technique that takes advantage of socially-annotated hierarchical categories while dealing with the inconsistencies and noise inherent in the distributed way that they are generated. The technique is demonstrated through two applications: mapping the distribution of topics in Wikipedia and how they have changed over time; and mapping the degree of conflict found in each topic area. We also discuss the utility of the approach for other applications and datasets involving collaboratively annotated category hierarchies.
- Cohen, J., Cohen, P., West, S.G.,&Aiken, L.S. 2003. Applied multiple regression/correlation analysis for the behavioral sciences. Lawrence Erlbaum Associates, Mahwah, New Jersey.Google Scholar
- Cosley, D., Frankowski, D., Terveen, L., Riedl, J. 2007. SuggestBot: Using intelligent task routing to help people find work in Wikipedia. In Proc. IUI, Honolulu, HI, 32--41. Google ScholarDigital Library
- Halavais, A.&Lackaff, D. 2008. An analysis of topical coverage of Wikipedia. JCMC, 13, 429--440.Google ScholarCross Ref
- Holloway, T., Bozicevic, M.,&Böörner, K. 2005. Analyzing and visualizing the semantic coverage of Wikipedia and its authors. ArXiv Computer Science e-prints, cs/0512085.Google Scholar
- Kittur, A., Chi, E.,&Suh, B. 2008. Crowdsourcing user studies with Mechanical Turk. In CHI 2008. Google ScholarDigital Library
- Kittur, A., Suh, B., Pendleton, B. A.,&Chi, E. H. 2007. He says, she says: Conflict and coordination in Wikipedia. In CHI 2007, San Jose, CA, 453--462. Google ScholarDigital Library
- Priedhorsky, R., Chen, J., Lam, S. T. K., Panciera, K., Terveen, L., Riedl, J. 2007. Creating, destroying, and restoring value in Wikipedia. In Proc. GROUP, 2007. Google ScholarDigital Library
- Rada, R., H., Mili, E., Bicknell&M. Blettner (1989). Development and application of a metric to semantic nets. IEEE Transactions on Systems, Man and Cybernetics, 19(1):17.30.Google ScholarCross Ref
- Schonhofen, P. 2006. Identifying document topics using the Wikipedia category network. In Proc. Intl. Conf. on Web Intelligence, 456--462. Google ScholarDigital Library
- Strube, M.,&Ponzetto, S. P. 2006. WikiRelate! Computing semantic relatedness using Wikipedia. In Proc. of AAAI 2006, 1419--1424. Google ScholarDigital Library
- Wu, Z.&M. Palmer (1994). Verb semantics and lexical selection. In Proc. of ACL 1994, pp. 133--138. Google ScholarDigital Library
Index Terms
- What's in Wikipedia?: mapping topics and conflict using socially annotated category structure
Recommendations
Wikipedia's “Neutral Point of View”: Settling Conflict through Ambiguity
This article discusses how one of the most important Wikipedia policies, the “neutral point of view” (NPOV), is appropriated and interpreted by the participants in the Wikipedia project. By analyzing a set of constitutive documents for the Wikipedian ...
DAWT: Densely Annotated Wikipedia Texts Across Multiple Languages
WWW '17 Companion: Proceedings of the 26th International Conference on World Wide Web CompanionIn this work, we open up the DAWT dataset - Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The ...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Comments