skip to main content
10.1145/2663714.2668048acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Linking Today's Wikipedia and News from the Past

Published:03 November 2014Publication History

ABSTRACT

In this paper we propose a novel task of automatically linking Wikipedia excerpts describing events to past news articles. Constantly evolving Wikipedia articles tend to summarize past events by abstracting fine-grained details that mattered when the event happened. On the other hand, contemporary news articles provide details of events, as they had happened. With connections between these two orthogonal information sources in place, a user could jump between them to acquire a holistic view on past events. We cast the linking problems into two retrieval tasks and propose a single framework for addressing them. In addition, we delineate challenges involved in both these tasks and propose a framework to address these challenges. To build a better understanding of the problem, we initially consider the simpler task of linking Wikipedia events that are systematically curated into years, decades and centuries, to relevant news articles from the past. These events come with a short textual description and a date indicating when the event happened. We present a two-stage cascade approach that leverages the temporal information associated to a given event for improving the linking effectiveness. We additionally design several baselines and show that our approach outperforms all the baselines. Through the results of studying the simplified task we come a step closer to solving the larger problem proposed in this paper. As future work, we plan to build an automatic linking system that answers to the challenges identified in this paper.

References

  1. J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publishers, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study final report. In DARPA, 1998.Google ScholarGoogle Scholar
  3. J. Allan, R. Papka, and V. Lavrenko. On-Line New Event Detection and Tracking. In SIGIR, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Alonso and M. Gertz. Time and information retrieval. In Encyclopedia of Database Systems, 2009.Google ScholarGoogle Scholar
  5. A. Anand, S. J. Bedathur, K. Berberich, and R. Schenkel. Index maintenance for time-travel text search. In SIGIR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Bellot, A. Doucet, S. Geva, S. Gurajada, J. Kamps, G. Kazai, M. Koolen, A. Mishra, V. Moriceau, J. Mothe, M. Preminger, E. SanJuan, R. Schenkel, X. Tannier, M. Theobald, M. Trappett, and Q. Wang. Overview of INEX 2013. In CLEF, 2013.Google ScholarGoogle Scholar
  7. M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In SIGIR, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. K. Berberich, S. Bedathur, O. Alonso, and G. Weikum. A language modeling approach for temporal information needs. In ECIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. K. Berberich, S. J. Bedathur, O. Alonso, and G. Weikum. A Language Modeling Approach for Temporal Information Needs. In ECIR, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. Bonnefoy, V. Bouvier, and P. Bellot. LSIS/LIA at TREC 2012 knowledge base acceleration. TREC, 2013.Google ScholarGoogle Scholar
  11. M. Bron, B. Huurnink, and M. de Rijke. Linking Archives Using Document Enrichment and Term Selection. In TPDL, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. E. D. Buccio, M. Melucci, and F. Moro. Detecting verbose queries and improving information retrieval. Information Processing and Management, (0), 2013.Google ScholarGoogle Scholar
  13. S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP-CoNLL, 2007.Google ScholarGoogle Scholar
  14. J. He, M. de Rijke, M. Sevenster, R. C. van Ommering, and Y. Qian. Generating links to background knowledge: a case study using narrative radiology reports. In CIKM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Henzinger, B. Chang, B. Milch, and S. Brin. Query-free news search. In WWW, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. In IJCAI, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Hoffart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust Disambiguation of Named Entities in Text. In EMNLP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. W. C. Huang, Y. Xu, A. Trotman, and S. Geva. Overview of INEX 2007 Link the Wiki Track. In INEX, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. V. Jijkoun, M. A. Khalid, M. Marx, and M. de Rijke. Named entity normalization in user generated content. In AND, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Kamps, M. Koolen, F. Adriaans, and M. de Rijke. A Cross-Language Approach to Historic Document Retrieval. In Digital Historical Corpora, Dagstuhl Seminar Proceedings, 2006.Google ScholarGoogle Scholar
  21. R. Kern, C. Seifert, and M. Granitzer. A Hybrid System for German Encyclopedia Alignment. Int. J. Digit. Libr., (2), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Mishra, K. Berberich, and D. Milchevski. Linking wikipedia events to past news. In TAIA, 2014.Google ScholarGoogle Scholar
  24. A. Mishra, S. Gurajada, and M. Theobald. Design and evaluation of an ir-benchmark for sparql queries with fulltext conditions. In ESAIR, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Mishra, S. Gurajada, and M. Theobald. SPAR-Key: Processing SPARQL-Fulltext Queries to Solve Jeopardy! Clues. In CLEF, 2013.Google ScholarGoogle Scholar
  26. M.-H. Peetz, E. Meij, and M. de Rijke. Using temporal bursts for query modeling. Information Retrieval, 17, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Trotman, D. Alexander, and S. Geva. Overview of the INEX 2010 Link the Wiki Track. In INEX, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Tsagkias, M. de Rijke, and W. Weerkamp. Linking online news and social media. In WSDM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In KDD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Y. Yang, N. Bansal, W. Dakka, P. Ipeirotis, N. Koudas, and D. Papadias. Query by document. In WSDM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. Zhai. Statistical language models for information retrieval. Synthesis Lectures on Human Language Technologies, 1, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Linking Today's Wikipedia and News from the Past

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PIKM '14: Proceedings of the 7th Workshop on Ph.D Students
      November 2014
      70 pages
      ISBN:9781450314817
      DOI:10.1145/2663714

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 3 November 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      PIKM '14 Paper Acceptance Rate4of10submissions,40%Overall Acceptance Rate25of62submissions,40%

      Upcoming Conference

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader