skip to main content
10.1145/3184558.3191642acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article
Free access

Real-time Event-based News Suggestion for Wikipedia Pages from News Streams

Published: 23 April 2018 Publication History

Abstract

Wikipedia is one of the top visited resources on the Web, furthermore, it is used extensively as the main source of information for applications like Web search, question & answering etc. This is mostly attributed to Wikipedia's coverage in terms of topics and real-world entities and the fact that Wikipedia articles are constantly updated with new and emerging facts. However, only a small fraction of articles are considered to be of good quality. The large majority of articles are incomplete and have other quality issues. A strong quality indicator is the presence of external references from third-party sources (e.g. news sources) as suggested by the verifiability principle in Wikipedia. Even for the existing references in Wikipedia there is an inherent lag in terms of the publication time of cited resources and the time they are cited in Wikipedia articles. We propose a near real-time suggestion of news references for Wikipedia from a daily news stream. We model daily news into specific events, spanning from a day up to year. Thus, we construct an event-chain from which we determine when the information in an event has converged and consequentially based on a learning-to-rank approach suggest the most authoritative and complete news article to Wikipedia articles involved in a specific event. We evaluate our news suggestion approach on a set of 41 events extracted from Wikipedia currents event portal, and on new corpus consisting of daily news between the period of 2016-2017 with more than 14 million news articles. We are able to suggest news articles to Wikipedia pages with an overall accuracy of MAP=0.77 and with a minimal lag w.r.t the publication time of the news article.

References

[1]
Gianni Amati and Cornelis Joost Van Rijsbergen. 2002. Probabilistic Models of Information Retrieval Based on Measuring the Divergence from Randomness. ACM Trans. Inf. Syst. Vol. 20, 4 (Oct. 2002), 357--389. 0.1145/2038558.2038577
[2]
Matt Kusner, Yu Sun, Nicholas Kolkin, and Kilian Weinberger. 2015. From word embeddings to document distances. In International Conference on Machine Learning. 957--966.
[3]
Zhiwei Li, Bin Wang, Mingjing Li, and Wei-Ying Ma. 2005. A probabilistic model for retrospective news event detection Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, 106--113.
[4]
Arunav Mishra and Klaus Berberich. 2016. Leveraging semantic annotations to link wikipedia and news archives European Conference on Information Retrieval. Springer, 30--42.
[5]
Martina Naughton, Nicholas Kushmerick, and Joseph Carthy. 2006. Event extraction from heterogeneous news sources. proceedings of the AAAI workshop event extraction and synthesis. 1--6.
[6]
Christina Sauper and Regina Barzilay. 2009. Automatically Generating Wikipedia Articles: A Structure-Aware Approach ACL 2009, Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the AFNLP, 2--7 August 2009, Singapore. 208--216. http://www.aclweb.org/anthology/P09--1024
[7]
Vinay Setty, Abhijit Anand, Arunav Mishra, and Avishek Anand. 2017. Modeling Event Importance for Ranking Daily News Events Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. ACM, 231--240.
[8]
Morten Warncke-Wang, Vivek Ranjan, Loren G. Terveen, and Brent J. Hecht. 2015. Misalignment Between Supply and Demand of Quality Content in Peer Production Communities Proceedings of the Ninth International Conference on Web and Social Media, ICWSM 2015, University of Oxford, Oxford, UK, May 26--29, 2015. 493--502. http://www.aaai.org/ocs/index.php/ICWSM/ICWSM15/paper/view/10591

Cited By

View all

Index Terms

  1. Real-time Event-based News Suggestion for Wikipedia Pages from News Streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    WWW '18: Companion Proceedings of the The Web Conference 2018
    April 2018
    2023 pages
    ISBN:9781450356404
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    • IW3C2: International World Wide Web Conference Committee

    In-Cooperation

    Publisher

    International World Wide Web Conferences Steering Committee

    Republic and Canton of Geneva, Switzerland

    Publication History

    Published: 23 April 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. event-chaining
    2. news stream
    3. news-suggestion
    4. wikipedia enrichment

    Qualifiers

    • Research-article

    Conference

    WWW '18
    Sponsor:
    • IW3C2
    WWW '18: The Web Conference 2018
    April 23 - 27, 2018
    Lyon, France

    Acceptance Rates

    Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)93
    • Downloads (Last 6 weeks)11
    Reflects downloads up to 19 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Wiki warriors: language editors counter knowledge hierarchies during the pandemicAsian Journal of Communication10.1080/01292986.2024.2447587(1-19)Online publication date: 9-Jan-2025
    • (2022)Two Wikipedias in Bhutan: problems and solutions for knowledge equity in the digital ageAsian Journal of Communication10.1080/01292986.2021.193724832:5(399-416)Online publication date: 3-Sep-2022
    • (2020)Application of News Features in News Recommendation Methods: A SurveyData Science10.1007/978-981-15-7984-4_9(113-125)Online publication date: 20-Aug-2020
    • (2019)Searching News Articles Using an Event Knowledge Graph Leveraged by WikidataCompanion Proceedings of The 2019 World Wide Web Conference10.1145/3308560.3316761(1232-1239)Online publication date: 13-May-2019

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media