ABSTRACT
In this paper we propose a novel task of automatically linking Wikipedia excerpts describing events to past news articles. Constantly evolving Wikipedia articles tend to summarize past events by abstracting fine-grained details that mattered when the event happened. On the other hand, contemporary news articles provide details of events, as they had happened. With connections between these two orthogonal information sources in place, a user could jump between them to acquire a holistic view on past events. We cast the linking problems into two retrieval tasks and propose a single framework for addressing them. In addition, we delineate challenges involved in both these tasks and propose a framework to address these challenges. To build a better understanding of the problem, we initially consider the simpler task of linking Wikipedia events that are systematically curated into years, decades and centuries, to relevant news articles from the past. These events come with a short textual description and a date indicating when the event happened. We present a two-stage cascade approach that leverages the temporal information associated to a given event for improving the linking effectiveness. We additionally design several baselines and show that our approach outperforms all the baselines. Through the results of studying the simplified task we come a step closer to solving the larger problem proposed in this paper. As future work, we plan to build an automatic linking system that answers to the challenges identified in this paper.
- J. Allan, editor. Topic Detection and Tracking: Event-based Information Organization. Kluwer Academic Publishers, 2002. Google ScholarDigital Library
- J. Allan, J. Carbonell, G. Doddington, J. Yamron, and Y. Yang. Topic detection and tracking pilot study final report. In DARPA, 1998.Google Scholar
- J. Allan, R. Papka, and V. Lavrenko. On-Line New Event Detection and Tracking. In SIGIR, 1998. Google ScholarDigital Library
- O. Alonso and M. Gertz. Time and information retrieval. In Encyclopedia of Database Systems, 2009.Google Scholar
- A. Anand, S. J. Bedathur, K. Berberich, and R. Schenkel. Index maintenance for time-travel text search. In SIGIR, 2012. Google ScholarDigital Library
- P. Bellot, A. Doucet, S. Geva, S. Gurajada, J. Kamps, G. Kazai, M. Koolen, A. Mishra, V. Moriceau, J. Mothe, M. Preminger, E. SanJuan, R. Schenkel, X. Tannier, M. Theobald, M. Trappett, and Q. Wang. Overview of INEX 2013. In CLEF, 2013.Google Scholar
- M. Bendersky and W. B. Croft. Discovering key concepts in verbose queries. In SIGIR, 2008. Google ScholarDigital Library
- K. Berberich, S. Bedathur, O. Alonso, and G. Weikum. A language modeling approach for temporal information needs. In ECIR, 2010. Google ScholarDigital Library
- K. Berberich, S. J. Bedathur, O. Alonso, and G. Weikum. A Language Modeling Approach for Temporal Information Needs. In ECIR, 2010. Google ScholarDigital Library
- L. Bonnefoy, V. Bouvier, and P. Bellot. LSIS/LIA at TREC 2012 knowledge base acceleration. TREC, 2013.Google Scholar
- M. Bron, B. Huurnink, and M. de Rijke. Linking Archives Using Document Enrichment and Term Selection. In TPDL, 2011. Google ScholarDigital Library
- E. D. Buccio, M. Melucci, and F. Moro. Detecting verbose queries and improving information retrieval. Information Processing and Management, (0), 2013.Google Scholar
- S. Cucerzan. Large-Scale Named Entity Disambiguation Based on Wikipedia Data. In EMNLP-CoNLL, 2007.Google Scholar
- J. He, M. de Rijke, M. Sevenster, R. C. van Ommering, and Y. Qian. Generating links to background knowledge: a case study using narrative radiology reports. In CIKM, 2011. Google ScholarDigital Library
- M. Henzinger, B. Chang, B. Milch, and S. Brin. Query-free news search. In WWW, 2005. Google ScholarDigital Library
- J. Hoffart, F. M. Suchanek, K. Berberich, and G. Weikum. YAGO2: A Spatially and Temporally Enhanced Knowledge Base from Wikipedia. In IJCAI, 2013. Google ScholarDigital Library
- J. Hoffart, M. A. Yosef, I. Bordino, H. Furstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust Disambiguation of Named Entities in Text. In EMNLP, 2011. Google ScholarDigital Library
- D. W. C. Huang, Y. Xu, A. Trotman, and S. Geva. Overview of INEX 2007 Link the Wiki Track. In INEX, 2007. Google ScholarDigital Library
- V. Jijkoun, M. A. Khalid, M. Marx, and M. de Rijke. Named entity normalization in user generated content. In AND, 2008. Google ScholarDigital Library
- J. Kamps, M. Koolen, F. Adriaans, and M. de Rijke. A Cross-Language Approach to Historic Document Retrieval. In Digital Historical Corpora, Dagstuhl Seminar Proceedings, 2006.Google Scholar
- R. Kern, C. Seifert, and M. Granitzer. A Hybrid System for German Encyclopedia Alignment. Int. J. Digit. Libr., (2), 2010. Google ScholarDigital Library
- R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In CIKM, 2007. Google ScholarDigital Library
- A. Mishra, K. Berberich, and D. Milchevski. Linking wikipedia events to past news. In TAIA, 2014.Google Scholar
- A. Mishra, S. Gurajada, and M. Theobald. Design and evaluation of an ir-benchmark for sparql queries with fulltext conditions. In ESAIR, 2012. Google ScholarDigital Library
- A. Mishra, S. Gurajada, and M. Theobald. SPAR-Key: Processing SPARQL-Fulltext Queries to Solve Jeopardy! Clues. In CLEF, 2013.Google Scholar
- M.-H. Peetz, E. Meij, and M. de Rijke. Using temporal bursts for query modeling. Information Retrieval, 17, 2014. Google ScholarDigital Library
- A. Trotman, D. Alexander, and S. Geva. Overview of the INEX 2010 Link the Wiki Track. In INEX, 2010. Google ScholarDigital Library
- M. Tsagkias, M. de Rijke, and W. Weerkamp. Linking online news and social media. In WSDM, 2011. Google ScholarDigital Library
- X. Wang, C. Zhai, X. Hu, and R. Sproat. Mining correlated bursty topic patterns from coordinated text streams. In KDD, 2007. Google ScholarDigital Library
- Y. Yang, N. Bansal, W. Dakka, P. Ipeirotis, N. Koudas, and D. Papadias. Query by document. In WSDM, 2009. Google ScholarDigital Library
- C. Zhai. Statistical language models for information retrieval. Synthesis Lectures on Human Language Technologies, 1, 2008. Google ScholarDigital Library
Index Terms
- Linking Today's Wikipedia and News from the Past
Recommendations
How much is Wikipedia Lagging Behind News?
WebSci '15: Proceedings of the ACM Web Science ConferenceWikipedia, rich in entities and events, is an invaluable resource for various knowledge harvesting, extraction and mining tasks. Numerous resources like DBpedia, YAGO and other knowledge bases are based on extracting entity and event based knowledge ...
Evaluating Entity Linking with Wikipedia
Named Entity Linking (nel) grounds entity mentions to their corresponding node in a Knowledge Base (kb). Recently, a number of systems have been proposed for linking entity mentions in text to Wikipedia pages. Such systems typically search for candidate ...
EXPOSÉ: EXploring Past news fOr Seminal Events
WWW '15 Companion: Proceedings of the 24th International Conference on World Wide WebRecent increases in digitization and archiving efforts on news data have led to overwhelming amounts of online information for general users, thus making it difficult for them to retrospect on past events. One dimension along which past events can be ...
Comments