ABSTRACT
Wikipedia is an Open Content resource, which is constructed by a users community, and is widely employed in educational contexts by both students and teachers. Wikipedia articles have hyperlinks that connect them, so it is possible to represent Wikipedia as a network, in which the nodes are the articles and the edges are hyperlinks. In this paper we analyze a complete copy of the Spanish Wikipedia. We apply Social Networks Analysis Techniques and, more precisely, Communities Detection Techniques, in order to identify clusters of articles with similar content. As the number of clusters is relatively small we use manual analyses to detect science articles. In addition we identify the most representative scientific fields and their main features. We conclude that science articles are about 11.66 % of Spanish Wikipedia articles and that the most important clusters of scientific articles do not always coincide with classical Science disciplines. This kind of analyses contributes to understanding better Wikipedia as an educational tool.
- Viégas, F., Wattenberg, M. and Mckeon, M. 2007. The hidden order of Wikipedia, In Online Communities and Social Computing (Beijing, China, July 22-27, 2007). Springer-Verlag, Berlin, 445--454. Google ScholarDigital Library
- Voss, J. 2005. Measuring wikipedia, In International Conference of the International Society for Scientometrics and Informetrics (Stockholm, Sweden, July 24-28, 2005). 221--231.Google Scholar
- Zhang, Y., Sun, A., Datta, A., Chang, K. and Lim, E. 2010. Do wikipedians follow domain experts?: A domain-specific study on wikipedia knowledge building, In Proceedings of the 10th Annual Joint Conference on Digital Libraries (Gold Coast, Australia, June 21-25, 2010), ACM, New York, 119--128. Google ScholarDigital Library
- Okoli, C. 2009. A brief review of studies of wikipedia in peer-reviewed journals, In Digital Society, 2009. ICDS'09. Third International Conference (Cancun, Mexico, February, 1-7 2009). IEEE Computer Society, 155--160. Google ScholarDigital Library
- Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. A. and Lanamäki, A. 2012, The People's Encyclopedia Under the Gaze of the Sages: A Systematic Review of Scholarly Research on Wikipedia. DOI=http://dx.doi.org/10.2139/ssrn.2021326Google Scholar
- Okoli, C., Mehdi, M., Mesgari, M., Nielsen, F. A. and Lanamäki, A. 2014. Wikipedia in the eyes of its beholders: A systematic review of scholarly research on Wikipedia readers and readership, Journal of the American Society for Information Science and Technology 65, 12(2014), 2381--2403.Google ScholarDigital Library
- Bar-Ilan, J. and Aharony, N. 2014. Twelve Years of Wikipedia Research, In Proceedings of the 2014 ACM Conference on Web Science (Bloomington, Indiana, 2014), ACM, New York, 243--244. Google ScholarDigital Library
- Almeida, R., Mozafari, B. and Cho, J. 2007. On the evolution of wikipedia, In International Conference on Weblogs and Social Media (Boulder, Colorado, March 26-28, 2007). URL: http://www.icwsm.org/papers/2--Almeida-Mozafari-Cho.pdfGoogle Scholar
- Holloway, T., Bozicevic, M. and Börner, K. 2007. Analyzing and visualizing the semantic coverage of Wikipedia and its authors, Complexity 12, 3 (2007), 30--40. Google ScholarDigital Library
- Hasan, H. 2011. Wikipedia, 3.5 Million Articles and Counting: Using and Assessing the People's Encyclopedia, The Rosen Publishing Group, New York.Google Scholar
- Jullien, N. 2012. What We Know About Wikipedia: A Review of the Literature Analyzing the Project (s). URL: http://halshs.archives-ouvertes.fr/docs/00/85/72/08/PDF/reviewliterature_wikipedia_Jullien.pdfGoogle Scholar
- Ceroni, A., Georgescu, M., Gadiraju, U., Naini, K. D. and Fisichella, M. 2014. Information evolution in wikipedia, In Proceedings of The International Symposium on Open Collaboration (Berlin, Germany, August 27-29, 2014). ACM, New York, 2014, pp. 24--34 Google ScholarDigital Library
- Javanmardi, S. and Lopes, C. 2010. Statistical measure of quality in wikipedia, In Proceedings of the First Workshop on Social Media Analytics (Washington, Columbia, 2010). ACM, New York, 132--138. Google ScholarDigital Library
- Milne, D. and Witten, I. H. 2013. An open-source toolkit for mining Wikipedia, Artificial Intelligence 194 (2013), 222--239. Google ScholarDigital Library
- Nastase, V. and Strube, M. 2008. Decoding Wikipedia Categories for Knowledge Acquisition, In Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (Chicago, Illinois, July 13-17, 2008). AAAI Press, Menlo Park, California, 1219--1224. Google ScholarDigital Library
- Wu, X., Fan, W., Sheng, M., Zhang, L., Shi, X., Su, Z. and Yu, Y. 2012. A Framework to Represent and Mine Knowledge Evolution from Wikipedia Revisions, In Proceedings of the 21st International Conference Companion on World Wide Web (Lyon, France, April 16-20, 2012). ACM, New York, 633--634. Google ScholarDigital Library
- Suchanek, F. M., Kasneci, G. and Weikum, G. 2008. YAGO: A Large Ontology from Wikipedia and WordNet, Web Semantics: Science, Services and Agents on the World Wide Web, 6,3(2008), 203--217. Google ScholarDigital Library
- Hoffart, J., Suchanek, F., Berberich, K. and Weikum, G. 2013. YAGO2: a spatially and temporally enhanced knowledge base from Wikipedia, Artificial Intelligence, 194(2013), 28--61. Google ScholarDigital Library
- Ponzetto, S. and Strube, M. 2007. Deriving a large scale taxonomy from Wikipedia, In Proceedings of the National Conference on Artificial Intelligence (Vancouver, July 22-27, 2007). AAAI Press, Menlo Park, California, 1440--1445. Google ScholarDigital Library
- Strube, M. and Ponzetto, S. 2006. WikiRelate! Computing semantic relatedness using Wikipedia, in Proceedings of the National Conference on Artificial Intelligence (Boston, Massachussetts, July 16-20, 2006). AAAI Press, Menlo Park, California, 1419--1424. Google ScholarDigital Library
- de Melo, G. and Weikum, G. 2014. Taxonomic data integration from multilingual Wikipedia editions, Knowledge and Information Systems 39, 1(2014), 1--39.Google ScholarDigital Library
- Nastase, V. and Strube, M. 2013. Transforming Wikipedia into a large scale multilingual concept network, Artificial Intelligence 194 (2013), 62--85. Google ScholarDigital Library
- Sorg, P. and Cimiano, P. 2008. Enriching the cross-lingual link structure of wikipedia-a classification-based approach, In Proceedings of the AAAI 2008 Workshop on Wikipedia and Artifical Intelligence. AAAI Press, Menlo Park, California, 49--54. URL: http://www.aaai.org/Papers/Workshops/2008/WS-08-15/WS08-15-009.pdfGoogle Scholar
- Paramita, M., Clough, P., Aker, A. and Gaizauskas, R. 2012. Correlation between Similarity Measures for Inter-Language Linked Wikipedia Articles, In Proceedings of the Eighth International Conference on Language Resources and Evaluation. LREC 2012 (Istanbul, Turkey, May 21-27, 2012). European Languages Resources Association, 790--797Google Scholar
- Milne, D. N., Witten, I. H. and Nichols, D. M. 2007. A knowledge-based search engine powered by wikipedia, In Proceedings of the sixteenth ACM Conference on Information and Knowledge Management (Lisbon, Portugal, 2007). ACM, New York, 445--454. Google ScholarDigital Library
- de Melo, G. and Weikum, G. 2010. Untangling the cross-lingual link structure of Wikipedia, In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (Uppsala, Sweden, 2010). ACL, Stroudsburg, PA, 844--853. Google ScholarDigital Library
- Bouma, G., Duarte, S. and Islam, Z. 2009. Cross-lingual alignment and completion of Wikipedia templates, In Proceedings of the Third International Workshop on Cross Lingual Information Access: Addressing the Information Need of Multilingual Societies (Boulder, Colorado), Association for Computational Linguistics, 21--29. Google ScholarDigital Library
- Adar, E., Skinner, M. and Weld, D. S. 2009. Information arbitrage across multi-lingual Wikipedia, In Proceedings of the Second ACM International Conference on Web Search and Data Mining (Barcelona, Spain, 2009). ACM, New York, 94--103. Google ScholarDigital Library
- Ren, X., Wang, Y., Yu, X., Yan, J., Chen, Z. and Han, J. 2014. Heterogeneous graph-based intent learning with queries, web pages and Wikipedia concepts, In Proceedings of the 7th ACM International Conference on Web Search and Data Mining (New York, February 24-28, 2014). ACM, New York, 23--32. Google ScholarDigital Library
- Dalton, J. and Dietz, L. 2012. Bi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 In Text Retrieval Conference 2012. Knowledge Base Acceleration Track. URL: http://trec.nist.gov/pubs/trec21/papers/umass_CIRR.kba.final.pdfGoogle Scholar
- Faulkner, A. 2014. Automated Classification of Stance in Student Essays: An Approach Using Stance Target Information and the Wikipedia Link-Based Measure, In FLAIRS Conference (Pensacola Beach, Florida, May 21-23, 2014), AAAI Press, Palo Alto, California, 2014. URL: http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS14/paper/view/7882Google Scholar
- Milne, D. and Witten, I. 2008. An effective, low-cost measure of semantic relatedness obtained from Wikipedia links, In Proceeding of AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy (Chicago, USA). AAAI Press, Chicago, 2008, 25--30.Google Scholar
- Holzmann, H. and Risse, T. 2014. Named entity evolution analysis on wikipedia, In Proceedings of the 2014 ACM Conference on Web Science (Bloomington, Indiana, June 23-26, 2014). ACM, New York, 241--242. Google ScholarDigital Library
- Weale, T. 2006. Utilizing Wikipedia categories for document classification, Evaluation. URL: ftp://ftp.cse.ohio-state.edu/pub/tech-report/2008/TR14.pdfGoogle Scholar
- Gabrilovich, E. and Markovitch, S. 2007. Computing semantic relatedness using wikipedia-based explicit semantic analysis, In Proceedings of the 20th international joint conference on Artifical intelligence (Hyderabad, India, January 6-12, 2007). AAAI Press, Palo Alto, California, 1606--1611. Google ScholarDigital Library
- Toral, A. and Munoz, R. 2006. A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia, In Workshop on NEW TEXT.Wikis and blogs and other dynamic text sources. EACL'06, (Trento, Italy, 2006). URL: http://www.aclweb.org/anthology/W/W06/W06-2809.pdf.Google Scholar
- Adafre, S. and de Rijke, M. 2005. Discovering missing links in Wikipedia, In Proceedings of the 3rd International Workshop on Link Discovery (Chicago, Illinois, 2005). ACM, New York, 90--97. Google ScholarDigital Library
- Bellomi, F. and Bonato, R. 2005. Network analysis for Wikipedia, In Proceedings of Wikimania (Frankfurt, 2005). URL: http://www.fran.it/articles/wikimania_bellomi_bonato.pdf.Google Scholar
- Capocci, A., Servedio, V., Colaiori, F., Buriol, L., Donato, D., Leonardi, S. and Caldarelli, G. 2006. Preferential attachment in the growth of social networks: The internet encyclopedia Wikipedia, Physical Review, 74, 3 (2006), 036116. URL: http://www.inf.ufrgs.br/~buriol/papers/Physical_Review_E_06.pdfGoogle Scholar
- Bu, F., Hao, Y. and Zhu, X. 2011. Semantic relationship discovery with wikipedia structure, In Proceedings of the Twenty-Second international Joint Conference on Artificial Intelligence (Barcelona, Spain, July 16-22, 2011). AAAI Press, Menlo Park, California, 1770--1775. Google ScholarDigital Library
- Chernov, S., Iofciu, T., Nejdl, W. and Zhou, X. 2006. Extracting Semantics Relationships between Wikipedia Categories, SemWiki'06 (Buvda, Montenegro, June 2006). DOI=10.1.1.73.5507Google Scholar
- Kamps, J. and Koolen, M. 2009. Is Wikipedia link structure different?, In Proceedings of the Second ACM International Conference on Web Search and Data Mining (Barcelona, Spain 2009). ACM, New York, 232--241. Google ScholarDigital Library
- Soboroff, I. 2002. Do TREC Web collections look like the Web?, ACM SIGIR Forum, 36, 2(2002), 23--31. Google ScholarDigital Library
- Kozlova, N. 2005. Automatic ontology extraction for document classification, PhD thesis, Saarland University. URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.124.1221&rep=rep1&type=pdfGoogle Scholar
- Massa, P. 2011. Social Networks of Wikipedia, In Proceedings of the 22Nd ACM Conference on Hypertext and Hypermedia (Eindhoven, The Netherlands, 2011). ACM, New York, 221--230 Google ScholarDigital Library
- Preusse, J., Kunegis, J., Thimm, M., Staab, S. and Gottron, T. 2013. Structural Dynamics of Knowledge Networks, In Proceedings of the Seventh International Conference on Weblogs and Social Media (Cambridge, Massachussetts, July 8-11, 2013). AAAI Press, Menlo Park, California, 506--515Google Scholar
- Albert, R., Jeong, H. and Barabási, A. 1999. The Diameter of the World Wide Web, Nature, 401, 130--131Google ScholarCross Ref
- Buriol, L. S., Castillo, C., Donato, D., Leonardi, S. and Millozzi, S. 2006. Temporal analysis of the wikigraph, In Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference (Honk Kong, December 18-22, 2006). IEEE, Whasington, 45--51. Google ScholarDigital Library
- Bonacich, P. and Lloyd, P. 2001. Eigenvector-like measures of centrality for asymmetric relations, Social Networks, 23, 191--201.Google ScholarCross Ref
- Freeman, L. C. 1979. Centrality in social networks conceptual clarification, Social Networks 1, 3 (1979), 215--239.Google ScholarCross Ref
- Girvan, M. and Newman, M. E. J. 2002. Community structure in social and biological networks, In Proceedings of the National Academy of Sciences 99, 12 (2002), 7821--7826.Google ScholarCross Ref
- Papadopoulos, S., Kompatsiaris, Y., Vakali, A. and Spyridonos, P. 2012. Community detection in social media, Data Mining and Knowledge Discovery 24, 3 (2012), 515--554. Google ScholarDigital Library
- Plantié, M. and Crampes, M. 2013. Survey on social community detection 'Social Media Retrieval', In Social Media Retrieval, Springer, London, 65--85.Google Scholar
- Lancichinetti, A. and Fortunato, S. 2009. Community detection algorithms: a comparative analysis, Physical Review 80, 5 (2009), 056117--056128.Google Scholar
- Rosvall, M. and Bergstrom, C. T. 2008. Maps of random walks on complex networks reveal community structure, Proceedings of the National Academy of Sciences 105, 4 (2008), 1118--1123.Google ScholarCross Ref
- Rosvall, M. and Bergstrom, C. T. 2008. Maps of random walks on complex networks reveal community structure, Proceedings of the National Academy of Sciences 105, 4 (2008), 1118--1123.Google ScholarCross Ref
- Geiger, R. S. and Ribes, D. 2010. The work of sustaining order in wikipedia: the banning of a vandal, In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (Savannah, Georgia, February 6-10, 2010). ACM, New York, 117--126. Google ScholarDigital Library
- Blondel, V. D., Guillaume, J. L., Lambiotte, R. and Lefebvre, E. 2008. Fast unfolding of communities in large networks, Journal of Statistical Mechanics: Theory and Experiment. 2008, 10 (2008). URL: http://arxiv.org/pdf/0803.0476.pdfGoogle ScholarCross Ref
- Almind, T. C. and Ingwersen, P. 1997. Informetric Analyses on the World Wide Web: Methodological Approaches to 'webometrics', Journal of Documentation, 53,4(1997), 404--426Google ScholarCross Ref
- Papadopoulos, S., Kompatsiaris, Y., Vakali, A. and Spyridonos, P. 2012. Community detection in social media, Data Mining and Knowledge Discovery 24, 3 (2012), 515--554. Google ScholarDigital Library
Index Terms
- The implications of Wikipedia for contemporary science education: using social network analysis techniques for automatic organisation of knowledge
Recommendations
Analysis of community structure in Wikipedia
WWW '09: Proceedings of the 18th international conference on World wide webWe present the results of a community detection analysis of the Wikipedia graph. Distinct communities in Wikipedia contain semantically closely related articles. The central topic of a community can be identified using PageRank. Extracted communities ...
Two-stage approach to named entity recognition using Wikipedia and DBpedia
IMCOM '17: Proceedings of the 11th International Conference on Ubiquitous Information Management and CommunicationIn natural language understanding, extraction of named entity (NE) mentions in given text and classification of the mentions into pre-defined NE types are important processes. Most NE recognition (NER) relies on resources such as a training corpus or NE ...
Learning multilingual named entity recognition from Wikipedia
We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify ...
Comments