skip to main content
10.1145/2766462.2767879acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
tutorial

Exploiting Wikipedia for Information Retrieval Tasks

Authors Info & Claims
Published:09 August 2015Publication History

ABSTRACT

Wikipedia - the online encyclopedia - has long been used as a source of information for researchers, as well as being a subject of research itself. Wikipedia has been shown to be effective in recommender systems, sentiment analysis, validation and multiple domains in information retrieval. One of the reasons for Wikipedia's popularity among researchers and practitioners is the multiple types of information it contains, which enables practitioners to select the right "tool" for their respective tasks. In addition to its great potential, this multitude of information sources also poses a challenge: which sources of information are best suited for a specific problem and how can different types of data be combined? This tutorial aims to provide a holistic view of Wikipedia's different features - text, links, categories, page views, editing history etc. - and explore the different ways they can be utilized in a machine learning framework. By presenting and contrasting the latest works that utilize Wikipedia in multiple domains, this tutorial aims to increase the awareness among researchers and practitioners in these fields to the benefits of utilizing Wikipedia in their respective domains, in particular to the use of multiple sources of information simultaneously.

References

  1. B. Al-Shboul and S.-H. Myaeng. Query phrase expansion using wikipedia in patent class search. In Information Retrieval Technology, pages 115--126. Springer, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. O. Arazy, N. Kumar, and B. Shapira. A theory-driven design framework for social recommender systems. journal of the association for information research article, 2010.Google ScholarGoogle Scholar
  3. D. Buscaldi and P. Rosso. Mining knowledge from wikipedia for the question answering task. In Proceedings of the International Conference on Language Resources and Evaluation, pages 727--730, 2006.Google ScholarGoogle Scholar
  4. G. Demartini, C. S. Firan, T. Iofciu, and W. Nejdl. Semantically enhanced entity ranking. In J. Bailey, D. Maier, K.-D. Schewe, B. Thalheim, and X. S. Wang, editors, WISE, volume 5175 of Lecture Notes in Computer Science, pages 176--188. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Gabrilovich and S. Markovitch. Overcoming the brittleness bottleneck using Wikipedia: Enhancing text categorization with encyclopedic knowledge. In Proceedings of the 21st National Conference on Artificial Intelligence, pages 1301--1306, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. E. Gabrilovich and S. Markovitch. Wikipedia-based semantic interpretation for natural language processing. Journal of Artificial Intelligence Research, 34:443--498, 2009. Google ScholarGoogle ScholarCross RefCross Ref
  7. A. Grappy and B. Grau. Answer type validation in question answering systems. In Adaptivity, Personalization and Fusion of Heterogeneous Information, pages 9--15. LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Hachey, W. Radford, J. Nothman, M. Honnibal, and J. R. Curran. Evaluating entity linking with wikipedia. Artificial intelligence, 194:130--150, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C.-C. Hsu, Y.-T. Li, Y.-W. Chen, and S.-H. Wu. Query expansion via link analysis of wikipedia for clir. Proceedings of NTCIR-7, pages 125--131, 2008.Google ScholarGoogle Scholar
  10. R. Kaptein, P. Serdyukov, A. P. de Vries, and J. Kamps. Entity ranking using wikipedia as a pivot. In J. Huang, N. Koudas, G. J. F. Jones, X. Wu, K. Collins-Thompson, and A. An, editors, CIKM, pages 69--78. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. Katz, N. Ofek, B. Shapira, L. Rokach, and G. Shani. Using wikipedia to boost collaborative filtering techniques. In Proceedings of the Fifth ACM Conference on Recommender Systems, RecSys '11, pages 285--288, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. G. Katz, A. Shtok, O. Kurland, B. Shapira, and L. Rokach. Wikipedia-based query performance prediction. In ACM SIGIR , SIGIR '14, pages 1235--1238, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. R. N. O. Y. W. P. B. M. Y. S. B. K. Z. P. M. Kenneth Portier, Greta E. Greer and J. Yen. Understanding topics and sentiment in an online cancer survivor community. JNCI Monographs, 2013.Google ScholarGoogle Scholar
  14. M. Koolen, G. Kazai, and N. Craswell. Wikipedia pages as entry points for book search. In Proceedings of the Second ACM International Conference on Web Search and Data Mining, pages 44--53. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Y. Li, W. P. R. Luk, K. S. E. Ho, and F. L. K. Chung. Improving weak ad-hoc queries using wikipedia asexternal corpus. In ACM SIGIR, pages 797--798. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Lu, W. Lam, and Y. Zhang. Twitter user modeling and tweets recommendation based on wikipedia concept graph, 2012.Google ScholarGoogle Scholar
  17. V. Maidel, P. Shoval, B. Shapira, and M. Taieb-Maimon. Ontological content-based filtering for personalised newspapers. Online Information Review, 34(5):729--756, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  18. O. Maimon and L. Rokach. Data mining and knowledge discovery handbook (2nd Edition). Springer-Verlag New York, Inc., New York, NY, USA, 2nd edition, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  19. E. Menahem, L. Rokach, and Y. Elovici. Combining one-class classifiers via meta learning. In ACM CIKM, CIKM '13, pages 2435--2440, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Milne, O. Medelyan, and I. H. Witten. Mining domain-specific thesauri from wikipedia: A case study. In Proceedings of the 2006 IEEE/WIC/ACM international conference on web intelligence, pages 442--448. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Mirizzi, A. Ragone, T. D. Noia, and E. D. Sciascio. Ranking the linked data: The case of dbpedia. In B. Benatallah, F. Casati, G. Kappel, and G. Rossi, editors, ICWE, volume 6189 of Lecture Notes in Computer Science, pages 337--354. Springer, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Navigli and S. P. Ponzetto. Babelnet: Building a very large multilingual semantic network. In Proceedings of the 48th annual meeting of the association for computational linguistics, pages 216--225. Association for Computational Linguistics, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. N. Ofek and L. Rokach. A classifier to determine which wikipedia biographies will be accepted. Journal of the Association for Information Science and Technology, 66(1):213--218, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Pak. Using wikipedia to improve precision of contextual advertising. In Proceedings of the 4th Conference on Human Language Technology: Challenges for Computer Science and Linguistics, LTC'09, pages 533--543, Berlin, Heidelberg, 2011. Springer-Verlag. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Pehcevski, J. A. Thom, A.-M. Vercoustre, and V. Naumovski. Entity ranking in wikipedia: utilising categories, links and topic difficulty prediction. Inf. Retr., 13(5):568--600, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Pehcevski, A.-M. Vercoustre, and J. A. Thom. Exploiting locality of wikipedia links in entity ranking. In C. Macdonald, I. Ounis, V. Plachouras, I. Ruthven, and R. W. White, editors, ECIR, volume 4956 of Lecture Notes in Computer Science, pages 258--269. Springer, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. H. Raviv, D. Carmel, and O. Kurland. A ranking framework for entity oriented search using markov random fields. In Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search, page 1. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. H. Raviv, O. Kurland, and D. Carmel. The cluster hypothesis for entity oriented search. In ACM SIGIR, pages 841--844. ACM, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. F. Ricci, L. Rokach, B. Shapira, and P. B. Kantor. Recommender Systems Handbook. Springer-Verlag New York, Inc., New York, NY, USA, 1st edition, 2010. Google ScholarGoogle ScholarCross RefCross Ref
  30. V. Subramaniyaswamy and S. C. Pandian. Effective tag recommendation system based on topic ontology using wikipedia and wordnet. Int. J. Intell. Syst., 27(12):1034--1048, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. A.-M. Vercoustre, J. Pehcevski, and J. A. Thom. Using wikipedia categories and links in entity ranking. In Pre-proceedings of the sixth International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2007), 2007.Google ScholarGoogle Scholar
  32. M. Vidal, G. V. Menezes, K. Berlt, E. S. de Moura, K. Okada, N. Ziviani, D. Fernandes, and M. Cristo. Selecting keywords to represent web pages using wikipedia information. In Proceedings of the 18th Brazilian Symposium on Multimedia and the Web, pages 375--382. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Vivaldi, L. A. Cabrera-Diego, G. Sierra, and M. Pozzi. Using wikipedia to validate the terminology found in a corpus of basic textbooks. In LREC, pages 3820--3827, 2012.Google ScholarGoogle Scholar
  34. F. Wu and D. S. Weld. Autonomously semantifying wikipedia. In ACM CIKM, pages 41--50. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. F. Wu and D. S. Weld. Automatically refining the wikipedia infobox ontology. In Proceedings of the 17th international conference on World Wide Web, pages 635--644. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Z. Wu, G. Xu, R. Pan, Y. Zhang, Z. Hu, and J. Lu. Leveraging wikipedia concept and category information to enhance contextual advertising. In ACM CIKM, CIKM '11, pages 2105--2108, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Z. Wu, G. Xu, Y. Zhang, P. Dolog, and C. Lu. An improved contextual advertising matching approach based on wikipedia knowledge. Comput. J., 55(3):277--292, Mar. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. H. Zaragoza, H. Rode, P. Mika, J. Atserias, M. Ciaramita, and G. Attardi. Ranking very many typed entities on wikipedia. In ACM CIKM, CIKM '07, pages 1015--1018, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. L. Zhang, C. Li, J. Liu, and H. Wang. Graph-based text similarity measurement by exploiting wikipedia as background knowledge, 2011.Google ScholarGoogle Scholar
  40. W. Zhang, D. Wang, G.-R. Xue, and H. Zha. Advertising keywords recommendation for short-text web pages using wikipedia. ACM Trans. Intell. Syst. Technol., 3(2):36:1--36:25, Feb. 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting Wikipedia for Information Retrieval Tasks

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
      August 2015
      1198 pages
      ISBN:9781450336215
      DOI:10.1145/2766462

      Copyright © 2015 Owner/Author

      Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 9 August 2015

      Check for updates

      Qualifiers

      • tutorial

      Acceptance Rates

      SIGIR '15 Paper Acceptance Rate70of351submissions,20%Overall Acceptance Rate792of3,983submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader